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Preface 


This book developed out of a course I have taught since 1988 to first-year Ph.D. stu- 
dents at the University of Rochester on the use of optimization techniques in eco- 
nomic analysis. A detailed account of its contents is presented in Section 2.5 of 
Chapter 2. The discussion below is aimed at providing a broad overview of the book, 
as well as at emphasizing some of its special features. 


An Overview of the Contents 


The main body of this book may be divided into three parts. The first part, encompass- 
ing Chapters 3 through 8, studies optimization in n-dimensional Euclidean space, 
R”. Several topics are covered in this span. These include—but are not limited to— 
(i) the Weierstrass Theorem, and the existence of solutions to optimization problems, 
(ii) the Theorem of Lagrange, and necessary conditions for optima in problems with 
equality constraints; (iii) the Theorem of Kuhn and Tucker, and necessary conditions 
for optima in problems with inequality constraints; (iv) the role of convexity in ob- 
taining sufficient conditions for optima in constrained optimization problems; and 
(v) the extent to which convexity can be replaced with quasi-convexity, while still 
obtaining sufficiency of the first-order conditions for global optima. 

The second part of the book, comprised of Chapters 9 and 10, looks at the issue 
of parametric variation in optimization problems, that is, at the manner in which 
solutions to optimization problems respond to changes in the values of underlying 
parameters. Chapter 9 begins this exercise with the question of parametric continuity: 
under what conditions will solutions to optimization problems vary “continuously” 
with changes in the underlying parameters? An answer is provided in the centerpiece 
of this chapter, the Maximum Theorem. The strengthening of the Maximum Theorem 
that is obtained by adding convexity restrictions to the problem is also examined. 
Chapter 10 is concerned with parametric monotonicity, that is, with conditions under 
which increases in the values of parameters result in increases in the size of optimal 


Xi 
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Mathematical Preliminaries 


This chapter lays the mathematical foundation for the study of optimization that 
occupies the rest of this book. It focuses on three main topics: the topological structure 
of Euclidean spaces, continuous and differentiable functions on Euclidean spaces and 
their properties, and matrices and quadratic forms. Readers familiar with real analysis 
at the level of Rudin (1976) or Bartle (1964), and with matrix algebra at the level of 
Munkres (1964) or Johnston (1984, Chapter 4), will find this chapter useful primarily 
as a refresher; for others, a systematic knowledge of its contents should significantly 
enhance understanding of the material to follow. 

Since this is not a book in introductory analysis or linear algebra, the presentation 
in this chapter cannot be as comprehensive or as leisurely as one might desire. The 
results stated here have been chosen with an eye to their usefulness towards the book's 
main purpose, which is to develop a theory of optimization in Euclidean spaces. The 
selective presentation of proofs in this chapter reveals a similar bias. Proofs whose 
formal structure bears some resemblance to those encountered in the main body of 
the text are spelt out in detail; others are omitted altogether, and the reader is given 
the cnoice of either accepting the concerned results on faith or consulting the more 
priniury sources listed alongside the result. 

It would be inaccurate to say that this chapter does not presuppose any knowl- 
edge on the part of the reader, but it is true that it does not presuppose much. Ap- 
pendices A and B aim to fill in the gaps and make the book largely self-contained. 
Appendix A reviews the basic rules of propositional logic; it is taken for granted 
throughout that the reader is familiar with this material. An intuitive understanding 
of the concept of an “irrational number,” and of the relationship between rational 
and irrational numbers, suffices for this chapter and for the rest of this book. A 
formal knowledge of the real line and its properties will, however, be an obvious 
advantage, and readers who wish to acquaint themselves with this material may 
consult Appendix B. 


2 Chapter 1 Mathematical Preliminaries 


The discussion in this chapter takes place solely in the context of Euclidean spaces. 
This is entirely adequate for our purposes, and avoids generality that we do not need. 
However, Euclidean spaces are somewhat special in that many of their properties 
(such as completeness, or the compactness of closed and bounded sets) do not carry 
over to more general metric or topological spaces. Readers wishing to view the 
topological structure of Euclidean spaces in a more abstract context can, at a first 
pass, consult Appendix C, where the concepts of inner product, norm, metric, and 
topology are defined on general vector spaces, and some of their properties are 
reviewed. 


1.1 Notation and Preliminary Definitions 
1.1.1 Integers, Rationals, Reals, R” 


The notation we adopt is largely standard. The set of positive integers is denoted by 
N, and the set of all integers by Z: 


N = {1,2,3,...} 
Z = {...,—2,—1,0, 1,2, ...}. 


The set of rational numbers is denoted by Q: 
P 
Q = poe pqedg #0}. 


Finally, the set of all real numbers, both rational and irrational, is denoted by R. As 
mentioned earlier, it is presumed that the reader has at least an intuitive understanding 
of the real line and its properties. Readers lacking this knowledge should first review 
Appendix B. 

Given a real number z € R, its absolute value will be denoted |z}: 


z ifz>0 
|z| = 


—z ifz<0O. 


The Euclidean distance between two points x and y in R is defined as |x — yl, i.e., 
as the absolute value of their difference. 

For any positive integer n € N, the n-fold Cartesian product of R will be denoted 
R”. We will refer to R” as n-dimensional Euclidean space. When n = 1, we shall 
continue writing R for R}. 


A point in R” is a vector x = (x1, ... , Xn) where foreach i = 1,...,n,x; isa 
real number. The number x; is called the i-th coordinate of the vector x. 
We use 0 to denote the real number 0 as well as the null vector (0,...,0) e R”. 


This notation is ambiguous, but the correct meaning will usually be clear from the 
context. 
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Fig. 1.1. Vector Addition and Scalar Multiplication in R? 


Vector addition and scalar multiplication are defined in R” as follows: for x, y € 
R” anda eR, 


x+y = (X ty... Xn + Yn) 


ax = (@X],...,AXn). 


Figure 1.1 provides a graphical interpretation of vector addition and scalar mutipli- 
cation in R?. 


Given any two n-vectors x = (x;,..., Xn) and y= (yj,..., Yn), We write 
x= y, fx; = yi, i=1,...,n. 
x>y, ifx>y, Pelt 
x > y, ifx> yand x Æy. 
x>y, fx yir L= lorah 


Note that 


e x > y does not preclude the possibility that x = y, and 

e forn > l, the vectors x and y need not be comparable under any of the categories 
above; for instance, the vectors x = (2, 1) and y = (1,2) in R? do not satisfy 
x > y, but neither is it true that y > x. 


The nonnegative and strictly positive orthants of R", denoted R} and R4_,, re- 
spectively, are defined as 


Ri = (x e R" |x > 0}, 
and 


R} = {x € R” | x > 0}. 
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1.1.2 Inner Product, Norm, Metric 


This subsection describes three structures on the space R”: the Euclidean inner 
product of two vectors x and y in R”, the Euclidean norm of a vector x in R”, and 
the Euclidean metric measuring the distance between two points x and yin R”. Each 
of these generalizes a familiar concept from R. Namely, when n = 1, and x and y 
are just real numbers, the Euclidean inner product of x and y is just the product x y 
of the numbers x and y; the Euclidean norm of x is simply the absolute value |x| of 
x; and the Euclidean distance between x and y is the absolute value |x — y| of their 
difference. 

Given x, y € R”, the Euclidean inner product of the vectors x and y, denoted 
x + y, is defined as: 


n 
x: y= Y xiyi- 
i=l 
We shall henceforth refer to the Euclidean inner product simply as the inner product. 


Theorem 1.1 The inner product has the following properties for any vectors x, y,Z € 
R” and scalars a,b € R: 


1. Symmetry: x- y= y: x. 
2. Bilinearity: (ax + by). z=ax-z+by-zandx-(ay+6z)=x-ay+x - bz. 
3. Positivity: x -x > 0, with equality holding if and only if x = 0. 


Proof Symmetry and bilinearity are easy to verify from the definition of the inner 
product. To check that positivity holds, note that the square of a real number is always 
nonnegative, and can be zero if and only if the number is itself zero. It follows that 
as the sum of squared real numbers, x - x = Èf] x? is always nonnegative, and is 
zero if and only if x; = 0 for each /, i.e., if and only if x = 0. 0 


The inner product also satisfies a very useful condition called the Cauchy~Schwartz 
inequality: 


Theorem 1.2 (Cauchy—Schwartz Inequality) For any x, y € R” we have 
Ixy a EDO y. 


Proof For notational ease, let X =x- x, Y = y. y,and Z = x - y. Then, the result 
will be proved if we show that XY > Z?, since the required inequality will follow 
simply by taking square roots on both sides. 

If x = 0, then Z = X = 0, and the inequality holds trivially. Suppose, therefore, 
that x # 0. Note that by the positivity property of the inner product, we must then 
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have X > 0. The positivity property also implies that for any scalar a € R, we have 


0 


1A 


(ax + y) - (ax + y) 
a’x-x+2ax-ypty-y 
=a’X+2aZ +Y. 


In particular, this inequality must hold for a = —Z/ X. When this value of a is used 
in the equation above, we obtain 


Z\2 Z z? 
—}] X-2|—]|Z+Y=-|— Y > 0, 
(3) (5) 4 (3) m 
or Y > Z?/X. Since X > 0, this in turn implies YY > Z? as required. o 


The Euclidean norm (henceforth, simply the norm) of a vector x € R”, denoted 


llxij, is defined as 
s 1/2 
ixi = (3-7) . 
i=] 


The norm is related to the inner product through the identity 
Wx = œ+ x)" 
for all x € R”; in particular, the Cauchy—Schwartz inequality may be written as 
Ix + yl < Ie Hyl- 
Our next result, which describes some useful properties of the norm, uses this obser- 


vation. 


Theorem 1.3 The norm satisfies the following properties at all x, y € R”, and 
aeéR: 

l. Positivity: ||x|| > 0, with equality if and only if x = 0. 

2. Homogeneity: \jax|| = lal- xii. 

3. Triangle Inequality: |x + yl] < ixl + ly]. 


Proof The positivity property of the norm follows from the positivity property of 
the inner product, and the fact that ||x|| = (x - x)!/?. Homogeneity obtains since 


n 1/2 ñ 1/2 
laxi = ($x) = (#33?) = al |x|). 
i=l 
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The triangle inequality is a little trickier, we will need the Cauchy—Schwartz 
inequality to establish it. Observe that for any x and y in R”, we have 
IIx + yl? = + y) X + y) = Mal? + 2x y+ II? 

By the Cauchy—Schwartz inequality, x-y < {|x|[|l yil. Substituting this in the previous 


equation, we obtain 


Wx + yl? < Wall? + 2H yll + Hy? = Cell + iytd?. 


The proof is completed by taking square roots on both sides. 5 


The Euclidean distance d(x, y) between two vectors x and y in R” is given by 


à 1/2 
d(x, y) = (Soe - v?) ; 
i=l 


The distance function d is called a metric, and is related to the norm || - || through 
the identity 


d(x, y) = {Ix — yil 
for all x, y € R”. 


Theorem 1.4 The metric d satisfies the following properties at all x, y, z e R” : 
1. Positivity: d(x, y) > 0 with equality if and only if x = y. 
2. Symmetry: d(x, y) = d(y, x). 
3. Triangle Inequality: d(x, z) < d(x, y) + d(y, z) for all x, y, z € R”. 


Proof The positivity property of the metric follows from the positivity property of 
the norm, and the observation that d(x, y) = {|x — yll. Symmetry is immediate from 
the definition. The inequality d(x, z) < d(x, y) + d(y, z) is the same as 


lx — zi < lx = yl + iy- zi. 


This is just the triangle inequality for norms, which we have already established. © 


The concepts of inner product, norm, and metric can be defined on any abstract 
vector space, and not just R”. In fact, the properties we have listed in Theorems 1.1, 
1.3, and 1.4 are, in abstract vector spaces, defining characteristics of the respective 
concepts. Thus, for instance, an inner product on a vector space is defined to be any 
operator on that space that satisfies the three properties of symmetry, bilinearity, and 
positivity; while a norm on that space is defined to be any operator that meets the 
conditions of positivity, homogeneity, and the triangle inequality. For more on this, 
see Appendix C. 
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1.2 Sets and Sequences in R” 
1.2.1 Sequences and Limits 


A sequence in R” is the specification of a point x, € R” for each integer k € 
{1,2,...]. The sequence is usually written as 


X1, X2, X3, 


or, more compactly, simply as {xg}. Occasionally, where notational clarity will be 
enhanced by this change, we wll use superscripts instead of subscripts, and denote 
the sequence by (x*}. 

A sequence of points {xx} in R” is said to converge to a limit x (written xg —> x)if 
the distance d (xg, x) between x, and x tends to zero as & goes to infinity, i.e., if for all 
€ > 0, there exists an integer k(€) such that for all k > k(€), we have d (xk, x) < €. 
A sequence {x;} which converges to a limit x is called a convergent sequence. 

For example, the sequence {x,;} in R defined by x, = 1/k for all k is a convergent 
sequence, with limit x = 0. To see this, let any € > O be given. Let k(e) be any 
integer such thatk(€) > 1/e. Then, forallk > k(€), we haved (xx, 0) = d(1/k, 0) = 
1/k < 1/k(eé) < €, so indeed, x, — 0. 


Theorem 1.5 A sequence can have at most one limit. That is, if {x4} is a Sequence 
in R” converging to a point x € R”, it cannot also converge to a point y € R” for 
y#x. 


Proof This follows from a simple application of the triangle inequality. If x4 —> x 
and y Æ x, then 


d(xk, Y) Z d(x, y) — d (xk, x). 
Since d(x, y) > O and d(x, x) — 0, this inequality shows that d(x}, y) cannot go 


to zero as k goes to infinity, so x, — y is impossible. ia) 


A sequence {xx} in R” is called a bounded sequence if there exists a real number 
M such that |x| < M for all k. A sequence {xg} which is not bounded is said to 
be unbounded; that ts, {xg} is an unbounded sequence if for any M € R, there exists 
k(M) such that |xxcazyl > M. 


Theorem 1.6 Every convergent sequence in R” is bounded. 


Proof Suppose x, —> x. Lete = 1 in the definition of convergence. Then, there 
exists A(1) such that for all k > k(1), d(x, x) < 1. Since d(x, x) = jx — xlļ, an 
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application of the triangle inequality yields for any k > k(l) 
Well = We — x) + xl 


lxx — xf + Hl] 
< 14+lixil. 


1A 


Now define M to be the maximum of the finite set of numbers 


(xalh. xo- 1+ ixl. 


Then, M > jixx]| for all k, completing the proof. D 


While Theorem 1.5 established that a sequence can have at most one limit, The- 
orem 1.6 implies that a sequence may have no limit at all. Indeed, because every 
convergent sequence must be bounded, it follows that if {x4} is an unbounded se- 
quence, then {xg} cannot converge. Thus, for instance, the sequence {x+} in R defined 
by x, = k for all k is a non-convergent sequence.! 

However, unboundedness is not the only reason a sequence may fail to converge. 
Consider the following example: let {xx} in R be given by 

1 
p K= 1,3) 5 i085 
Xk = 
1--, k=2,4,6,... 


This sequence is bounded since we have jx] < 1 for all k. However, it does not 
possess a limit. The reason here is that the odd terms of the sequence are converging 
to the point 0, while the even terms are converging to the point 1. Since a sequence 
can have only one limit, this sequence does not converge. 

Our next result shows that convergence of a sequence {x*} in R” is equivalent 
to convergence in each coordinate. This gives us an alternative way to establish 
convergence in R”. We use superscripts to denote the sequence in this result to avoid 
confusion between the k-th element x, of the sequence, and the i-th coordinate x; of 
a vector x. 


Theorem 1.7 A sequence {xt} in R” converges to a limit x if and only if xÉ -> xi 
foreachi e {l,...,n}, where x* = Ga re) and x = (X1, -.., Xn). 


l This may also be shown directly: for any fixed candidate limit x the distance d (xx, x) = |x — x4] = [x —h| 
becomes unbounded as k goes to infinity. It follows that no x € R can be a limit of this sequence, and 
therefore that it does not possess a limit. 
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Proof We will use the fact that the Euclidean distance between two points x- = 
(x1,...,4%n) and y = (y1,..-., Yn) in R” can be written as 


1/2 
des = (Soh - vt) 


where |x; — yi] is the Euclidean distance between x; and y; in R. 
First, suppose that x* — x. We are to show that xk — x; foreach i, i.e., that, given 
any i and € > O, there exists k; (€) such that for k > kj(€), we have |x* — x;| < é. 
So let € > 0 be given. By definition of x* —> x, we know that there exists k(€) 
such that d(x*, x) < € forall k > k(e). Therefore, fork > k(€) and any i, we obtain : 


1/2 
1/2 n 
it —x l= (ixt -xil?) < (e -x = d(x*, x) < €. 
j=l 


Setting k; (€) = k(e€) for each i, the proof that xk — x; for each i is complete. 

Now, suppose that {x*) converges to x; for each i. Lete > 0 be given. We will 
show that there is k(€) such that d(x*, x) < e for all k > k(€), which will establish 
that xt —> x. 

Define n = ¢/./n. For each i, there exists k;(7) such that for k > k(n), we 
have Ix* — xil < n. Define k(€) to be the maximum of the finite set of numbers 
k(n), .--, ka (n). Then, for k > k(e€), we have bef — x;| < for alli, so 


; Be f 1/2 ne 7 1/2 
d(x", x)= ( |x; ~ xil < ( Fal =€, 
2 uly 


which completes the proof. 3 
Theorem 1.7 makes it easy to prove the following useful result: 


Theorem 1.8 Ler {x*} be a sequence in R” converging to a limit x. Suppose that 
for every k, we have a < x* < b, wherea = (a1, ...,an) and b = (by,..., bn) are 
some fixed vectors in R”. Then, it is also the case thata < x < b. 


Proof The theorem will be proved if we show that a; < x; < b; for each į € 
{1,..., n}. Suppose that the result were false, so oe some i, we had x; < a; (say). 
Since at — x, it is the case by Theorem 1.7 that xk — x; for each j € {l,... n}: 
in particular, xi — x;. But xf — x; combined vii xi < a; implies that for all Mage 
k, we must have ra: < aj. This contradicts the hypothesis that a; < xf < b; forall 
k. A similar argument establishes that x; > b; also leads to a contradiction. Thus, 
ai < x; < bi, and the proof is complete. OQ 
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1.2.2 Subsequences and Limit Points 


Let a sequence {xg} in R” be given. Let m be any rule that assigns to each k €e Na 
value m(k) €e N. Suppose further that m is increasing, i.e., for each k € N, we have 
m(k) < m(k + 1). Given {x}, we can now define a new sequence {X»,(4)}, whose 
k-th element is the m (k)-th element of the sequence {xz}. This new sequence is called 
a subsequence of {xx}. Put differently, a subsequence of a sequence is any infinite 
subset of the original sequence that preserves the ordering of terms. 

Even if a sequence {xx} is not convergent, it may contain subsequences that con- 
verge. For instance, the sequence 0, 1, 0, 1,0, 1, ... has no limit, but the subsequences 
0,0, 0,...and1, 1, 1,... which are obtained from the original sequence by selecting 
the odd and even elements, respectively, are both convergent. 

If a sequence contains a convergent subsequence, the limit of the convergent 
subsequence is called a limit point of the original sequence. Thus, the sequence 
0, 1,0,1,0, 1,... has two limit points 0 and 1. The following result is simply a 
restatement of the definition of a limit point: 


Theorem 1.9 A point x is a limit point of the sequence {xx} if and only if for any 
€ > 0, there are infinitely many indices m for which d(x, Xm) < €. 


Proof If x is a limit point of {xg} then there must be a subsequence {x,,(4)} that 
converges to x. By definition of convergence, it is the case that for any € > 0, all 
but finitely many elements of the sequence {xm (x)} must be within € of x. Therefore, 
infinitely many elements of the sequence {x} must also be within € of x. 
Conversely, suppose that for every € > 0, there are infinitely many m such that 
d(Xm, x) < €. Define a subsequence [{Xm(4)} as follows: let m(1) be any m for 
which d(xm,x) < 1. Now for k = 2,3,... define successively m(k) to be any 
m that satisfies (a) d(x, xm) < 1/k, and (b) m > m(k — 1). This construction 
is feasible, since for each k, there are infinitely many m satisfying d(xm,x) < 
1/k. Moreover, the sequence {Xmx)} evidently converges to x, so x is a limit point 
of {x4}. 0 


If a sequence {xg} is convergent (say, to a limit x), then it is apparent that every 
subsequence of {xx} must converge to x. It is less obvious, but also true, that if every 
subsequence {xmcx)} of a given sequence {xg} converges to the limit x, then {xx} 
itself converges to x. We do not offer a proof of this fact here, since it may be easily 
derived as a consequence of other considerations. See Corollary 1.19 below. 

In general, a sequence {xg} may have any number of limit points. For instance, 
every positive integer arises as a limit point of the sequence 


PR21223 1,253 54.008 
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Of course, it is also possible that ro subsequence of a given sequence converges, 
so a given sequence may have no limit points at all. A simple example is the sequence 
{xz} in R defined by x; = & for all k: every subsequence of this sequence diverges 
to +00. 


1.2.3 Cauchy Sequences and Completeness 


A sequence {xx} in R” is said to satisfy the Cauchy criterion if for alle > O there 
is an integer k(€) such that for all m,/ > k(€), we have d(xm, xi) < €. Informally, 
a sequence {xx} satisfies the Cauchy criterion if, by choosing k large enough, the 
distance between any two elements Xm and x; in the “tail” of the sequence 


Xk, Xk+ 1s Xk+2» oo 


can be made as small as desired. A sequence which satisfies the Cauchy criterion is 
called a Cauchy sequence. 

An example of a Cauchy sequence is given by the sequence {xx} in R defined by 
xk = 1/k? for all k. To check that this sequence does, in fact, satisfy the Cauchy 
criterion, iet any € > Obe given. Let k(€) be any integer k that satisfies kĉe > 2. For 

l | l l 2 


m,l > k(e), we have 
ee | 
z| = et KOR ker! ~ ier’ 


and this last term is less than € by choice of k(€). 

Our first result deals with the analog of Theorem 1.7 for Cauchy sequences. It 
establishes that a sequence in R” is a Cauchy sequence if and only if each of the 
coordinate sequences is a Cauchy sequence in R. 


l 
d (Xm, x1) = |z- 


Theorem 1.10 A sequence {x*} in R” is a Cauchy sequence if and only if for each 
i e{l,...,n}, the sequence {xk} is a Cauchy sequence in R. 


Proof Let {x*} be a Cauchy sequence in R”. We are to show that for any i and any 
€ > 0, there is kj(€) such that for all m,/ > k;(€), we have |x?” -= x! < €. So let 
€ > 0 and i be given. Since {x* } is a Cauchy sequence, there is Me such that for all 
m,l > k(e), we have d(x”, x x!) < €. Therefore, for any m, I > k(e), we have 


1/2 
x” — xl] = (xP =x E (Sm Ha se) = d(x", x!) < €. 


By setting k;(€) = &(e), the proof that {x$} is a Cauchy sequence for each i is 
complete. 
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Now suppose {xf} is a Cauchy sequence for each i. We are to show that for any 
€ > 0, there is k(e) such that m,/ > k(e) implies d(x”, x!) < e€. So lete > 0 
be given. Define n = €/./n. Since each {xi} is a Cauchy sequence, there ex- 
ists kj(n) such that for all m,/ > k;(n), we have |x” — x! < n. Define A(e) = 
max{k} (7), .-., 4n(7)}. Then, for m,/ > k(€), we have 


; 1/2 nor e A2 
m Iy m 12 = = 
d(x™, x!) = (Su xil ) = (Sigl ) j 


and the proof is complete. o 


Observe that the important difference between the definition of a convergent se- 
quence and that of a Cauchy sequence is that the limit of the sequence is explicitly 
involved in the former, but plays no role in the latter. Our next result shows, however, 
that the two concepts are intimately linked: 


Theorem 1.11 A sequence {x} in R” is a Cauchy sequence if and only if it is a 
convergent sequence, i.e., if and only if there is x € R” such that x, > x. 


Proof The proof that every convergent sequence must also be a Cauchy sequence is 
simple. Suppose x — x. Let € > 0 be given. Define n = €/2. Since xx — x, there 
exists k(n) such that for all j > k(n), we have d(x;, x) < n. It follows by using the 
triangle inequality that for all /,/ > k(n), we have 


d(xj;,x1) < d(xj,x)+d(xi,x) < n+ = €. 


Thus, setting (€) = k(n), the proof that {x4} is a Cauchy sequence is complete. 
The proof that every Cauchy sequence must converge unfortunately requires more 
apparatus than we have built up. In particular, it requires a formal definition of the 
notion of a “real number,” and the properties of real numbers. Appendix B, which 
which presents such a formal description, proves that every Cauchy sequence in R 
must converge. An appeal to Theorem 1.10 then completes the proof. a 


Any metric space which has the property that every Cauchy sequence is also a 
convergent sequence is said to be complete. Thus, by Theorem 1.11, R” is complete. 
It is important to note that not all metric spaces are complete (so the definition is not 
vacuous). For more on this, see Appendix C. 

Even without appealing to the completeness of R”, it is possible to show that a 
Cauchy sequence must possess two properties that all convergent sequences must 
have: namely, that it must be bounded, and that it has at most one limit point. 
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Theorem 1.12 Let {xx} be a Cauchy sequence in R”. Then, 


1. {xz} is bounded. 
2. (xx) has at most one limit point. 


Proof To see that every Cauchy sequence {x;} in R” must be bounded, take € = 1 in 
the Cauchy criterion. Then, there exists an integer k(1) such that for all j,/ > KO), 
we have d(x;, x1) < l. An application of the triangle inequality now implies for all 
j > k(l) that 


xsl = Wy — xka) + xal 
< Wy — xe + ah 
< 1+ lxo l. 


Let M be the maximum of the finite set of numbers 
feall,- xka- 1+ aoli) 


Then, by construction, M > ||xx|| for all k, and the proof is complete. 

To see that a Cauchy sequence cannot have two or more limit points, suppose 
that x is a limit point of the Cauchy sequence {x4}. We will show that x, — x. Let 
€ > 0 be given, and let 7 = €/2. Since {xx} is Cauchy, there exists k(n) such that 
d(x;, xı) < n forall j, / > k(n). Moreover, since x is a limit point of {x4}, there are 
elements of the sequence {xg} that lie arbitrarily close to x; in particular, we can find 
m > k(n) such that d(xm, x) < n. Therefore, if j > k(n), 


A 


d(xj,x) < d(xj,%Xm) + d(X%m, x) 
n+n 


=E. 


IA 


Since € > 0 was arbitrary, this string of inequalities shows precisely that x, —> x. 
QO 


Finally, two Cauchy sequences {x,} and {yx} are said to be equivalent if for all 
€ > 0, there is k(€) such that for all j > A(€) we have d(x;, yj) < €. We write 
this as {xx} ~ {yk}. It is easy to see that ~ is, in fact, an equivalence relationship. 
It is reflexive ({xx} ~ {xx}) and symmetric ({xx} ~ {yk} implies {yg} ~ {xk}. It 
is also transitive: {x4} ~ {yx} and {yk} ~ {zę} implies {xx} ~ {zx}. To see this, 
let € > 0 be given. Let n = €/2. Since {xk} ~ {yx}, there is kı (7) such that for 
all k > ky(n), we have d(xz, yx) < n. Similarly, there is k2(n) such that for all 
k > ko(n), d(yz, zk) < n. Let k(€) = max{k; (n), k2(n)}. Then, for k > k(€), we 
have d (xk, zk) < d(xk, ye) +d (yx, Zk) < n+n = €, and we indeed have {x;} ~ {z4}. 
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Equivalent Cauchy sequences must have the same limit.” For, suppose {x4} ~ (yx) 
and x, —> x. Given € > 0, there is kı (€) such that k > k,(€) implies d (xk, x) < €/2. 
There is also k2(€) such that Á > k2(e) implies d (xx, yk) < €/2. Therefore, letting 
k(e) = max{kı (€), k2(€)}, k 2 k(e) implies d(x, y4) < d(x, xk) + d(xk, Yk) < 
€/2 + €/2 = e, which states precisely that yg — x. 


1.2.4 Suprema, Infima, Maxima, Minima 


Let A be a nonempty subset of R. The set of upper bounds of A, denoted U(A), is 
defined as 


U(A) = {ue R[u >a forall a € A} 
while the set of lower bounds of A, denoted L(A), is given by 
L(A) ={1€ R]? <a foralla € 4}. 


in general, (/(A) and/or £(A) could be empty. For instance, if A = N, the set of 
positive integers, then U(A) is empty; if A = Z, the set of all integers, then both 
U(A) and L(A) are empty. If U(A) is nonempty, then A is said to be bounded above. 
If L(A) is nonempty, then A is said to be bounded below. 

The supremum of A, written sup A, is defined to be the least upper bound of A. 
Namely, if U (4) is nonempty, then sup A is defined to be the unique pointa* € U(A) 
such that a* < u for all u € U(A). If U(A) is empty, on the other hand, then 4 has 
no finite upper bounds, so, by convention, we set sup A = +00. 

Similarly, the infimum of A, denoted inf A, is defined to be the greatest lower bound 
of A. That is, when L(A) is nonempty, then inf A is the unique point â € L(A) such 
that a > l for all/ € L(A). If L(A) is empty, then 4 admits no finite lower bounds, 
so, by convention, we set inf A = —oo. 


Theorem 1.13 Jf U(A) ts nonempty, the supremum of A is well defined, i.e., there 
isa* € U(A) such that a* < u forallu € U(A). Similarly, if L(A) is nonempty, the 
infimum of A is well defined, i.e., there is â € L(A) such that â > | foralll € L(A). 


Remark 1 By our conventions that sup A = +00 when U (A) is empty and inf A = 
oo when L(A) is empty, this will establish that sup 4 and inf A are defined for any 
nonempty set A C R. o 


Remark 2 To avoid legitimate confusion, we should stress the point that some 


2In fact, the procedure of constructing the real numbers from the rational numbers using Cauchy sequences 
of rationals (see Appendix B) defines a real number to be an equivalence class of Cauchy sequences of 
rational numbers. 
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Case |: EERSSESTIUDEE A $$$ + —. 
a, =a atu t 
1 2 2 utu = u 1 


aj am atu, 42 “y= UQ 
Zy\ 2 


Fig. 1.2. Constructing the Supremum 


authors (e.g., Apostol, 1967, or Bartle, 1964) take an axiomatic approach to the 
real line. In this approach, Theorem 1.13 is an axiom, indeed, a key axiom. Other 
authors adopt a constructive approach to the real line, building the real numbers from 
the rationals by using, for example, the method of Dedekind cuts (Rudin, 1976), or 
equivalent Cauchy sequences of rationals (Hewitt and Stromberg, 1965; Strichartz, 
1982; Appendix B in this book). In this case, the result that bounded sets have well- 
defined suprema is genuinely a “theorem.” We implicitly adopt the latter approach 
in this chapter. 8 


Proof We prove that sup A is well defined whenever 4 is bounded above. A similar 
procedure can be employed to show that inf A is well-defined whenever 4 is bounded 
below. The details are left as an exercise. 

So suppose 4 is bounded above and U(A) is nonempty. We will construct two 
sequences {a,x} and {ux} that have a common limit a*. The sequence {ag} will consist 
entirely of points from A and will converge “upwards” to a*, while the sequence 
{ux} will consist solely of points from U (4), and will converge “downwards” to a”. 
It will follow quite easily from the construction of these sequences that the common 
limit a” is, in fact, sup A. 

The required sequences are constructed using a divide-and-conquer procedure. 
Pick any point a; in A and any point uy in U(A). Letz; = (ay -t yp )/2 be theii 
midpoint. Of course, a} < z} < uy. There are two possible cases: z} € U(A) and 
zı ¢ U(A). In case 1, where z; € U(A), set a2 = a, and u2 = z;. Incase 2, where 
zı ¢ U(A), there must exist some point a € A such that a > zi> In this case, let 
ay = a and u2 = uy (see Figure 1.2). 


3Note that zı ¢ UCA) does not imply that z} € A. For instance, if A = (0,1) U [3.4] and z} = 2. then 
zi ¢ U(A) andz, ¢ A. 
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Note that in either case we have a2 € A and u2 € U(A). Moreover, we must 
also have a, < az and u} > up. Finally, in the first case, we have d(a2,u2) = 
d(ay, (a) + u;)/2) = d(a;,uy)/2, while in the second case, we have d(a2, u2) = 
d(a,uj) < d(ay, u1)/2, since a > (ay + uy)/2. Thus, in either case, we have 
d(az, u2) < d(ay, uy)/2. 

We iterate on this procedure. Let z2 = (a2 +u2)/2be the midpoint of a2 and u2. If 
z2 € U(A), letuz = z3 and a3 = a2. If z2 ¢ U(A), then there mustexist a’ € A such 
thata’ > z3; in this case, seta3 = a’ and u3 = u2. Then, it is once again true that the 
following three conditions hold in either case: first, we have a3 € A and u3 € U(A); 
second, a3 > a2 and u3 < u2; finally, d(a3, u3) < d(a2, u2)/2 < d (a1, u1)/4. 

Continuing in this vein, we obtain sequences {ax} and {ux} such that 

1. for each k, ax € A and uz € UA); 
2. {ax} is a nondecreasing sequence, and {ug} is a nonincreasing sequence; and 
3. d(ag, ux) < d(ay,uy)/24. 

Property 2 implies in particular that {ax} and {u} are bounded sequences. It now 
follows from property 3 that {ag} and {u,} are equivalent Cauchy sequences, and, 
therefore, that they have the same limit a*. We will show that a* = sup, by showing 
first that a* € U(A), and then that a* < u for all u € U(A). 

Pick any a € A. Since ug € U(A) for each k, we have uz > a for each k, so 
a* = limy uy > a. Since a € A was arbitrary, this inequality implies a* € U(A). 
Now pick any u € U(A). Since a, € A for each k, we must have u > ax for each k. 
Therefore, we must also have u > lim, ap = a*. Since u € U(A) was arbitrary, this 
inequality implies u > a* for all u € U(A). 

Summing up, a* € U(A) and a* < u forall u € U(A). By definition, this means 
a* = sup A, and the proof is complete. O 


The following result is an immediate consequence of the definition of the supre- 
mum. We raise it to the level of a theorem, since it is an observation that comes in 
handy quite frequently: 


Theorem 1.14 Suppose sup A is finite. Then, for any € > Q, there is a(€) € A such 
that a(€) > sup A — €. 


Proof For notational ease, let a* = sup A. Suppose the theorem failed for some 
«€ > 0, that is, suppose there were e > 0 such that a < a* — € foralla € A. Then, 
a* — e would be an upper bound of A. But this violates the definition of a* as the 
least upper bound, since a* — € is obviously strictly smaller than a*. Oo 


A similar result to Theorem 1.14 evidently holds for the infimum. It is left to the 
reader as an exercise to fill in the details. 


1.2 Sets and Sequences a 17 


Two concepts closely related to the supremum and the infimum are the maximum, 
and the minimum of a nonempty set A C R. The maximum of A, denoted max 4, 
is defined as a point z € A such that z > a for all a € A. The minimum of 4, 
denoted min A, is defined as a point w € A such that w < a forall a € A. By 
definition, the maximum must be an upper bound of A, and the minimum must be a 
lower bound of A. Therefore, we can equivalently define max 4 = A N U(A), and 
min A = AN L(A). 

It is very important to point out that while sup A and inf A are always defined for 
any nonempty set A (they could be infinite), AM U(A) and A N L(A) could both 
be empty, so max A and min 4 need nor always exist. This is true even if sup A and 
inf A are both finite. For instance, if A = (0, 1), we have U(A) = {x | x > 1} and 
L(A) = {x | x < 0}, so sup 4 = | andinf A = 0, but max 4 and min A do not exist. 
Indeed, it follows from the definition that if max A (resp. min A) is well defined, we 
must have max A = sup A (resp. inf A = min A), so that max A exists if, and only 
if, sup A € A (resp. min A exists if and only if inf A € A). 


1.2.5 Monotone Sequences in R 


A sequence {x4} in R is said to be a monotone increasing sequence if it is the case 
that 


Xk41 2 xk for all k. 
It is monotone decreasing if 
Xk+i < xk forall k. 


We will also refer to monotone increasing sequences as nondecreasing sequences. 
and to monotone decreasing sequences as nonincreasing sequences. 

Monotone sequences possess a particularly simple asymptotic (i.e., limiting) struc- 
ture, and this is one of the reasons they are of special interest. To state the formal 
result requires one more definition. 

Say that a sequence {xx} in R diverges to +00 (written x t +00) if for all positive 
integers p € N, there is X(p) such that for all k > A(p), we have xg > p; and that 
{xx} diverges to —0o (written x, | —00) if for any positive integer p € N, there 
exists k( p) such that forall k > A(p), we have x, < =p. 

Observe that while a sequence that diverges to +00 must necessarily be unbounded, 
the converse is not always true: the sequence {xz} defined by 


fl if k is odd 
eee k, if k iseven 


is an unbounded sequence but it does not diverge to +00 (why?). On the other hand, it 
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is true that if {xg} is an unbounded sequence, it must contain at least one subsequence 
that diverges (to either +00 or —00). 

The following result classifies the asymptotic behavior of monotone increasing 
sequences. 


Theorem 1.15 Let {x} be a monotone increasing sequence in R. If {xx} is un- 
bounded, it must diverge to +00. If {xx} is bounded, it must converge to the limit x, 
where x is the supremum of the set of points {x,, x2, ...). 


Proof First suppose that {xx} is an unbounded sequence. Then, for any p € R, there 
exists an integer k( p) such that xx(p) > p. Since {xz} is monotone increasing, it is 
now the case that for any k > k(p), we have x, > xx(p) = p. This says precisely 
that {x;,} diverges to +00. 

Next suppose that {xz} is a bounded sequence. We will show that x, — x, where, 
as defined in the statement of the theorem, x is the supremum of the set of points 
{x1,X2,...J. Lete > 0 be given. The proof will be complete if we show that there is 
k(€) such that d(x, xx) < € forall k > k(e). 

Since x is the least upper bound of the set of points {x}, x2,..-.], x — € is not an 
upper bound of this set. Therefore, there exists k(€) such that xx(¢) > x — €. Since 
{xx} is monotone increasing, it follows that x, > x — € forall k > k(e). On the other 
hand, since x is an upper bound, it is certainly true that x > x, for all k. Combining 
these statements, we have 


xk E (x ~€,x], k> ke), 


which, of course, implies that d(x, x) < € for all k > k(e). o 


Since a sequence {xg} in R is monotone increasing if and only if {—x,} is monotone 
decreasing, Theorem 1.15 has the following immediate corollary: 


Corollary 1.16- Let {xx} be a monotone decreasing sequence in R. If {xx} is un- 
bounded, it must diverge to —oo. If {xy} is bounded, it must converge to the limit x, 
where x is the infimum of the set of points {x),x2,...}. 


1.2.6 The Lim Sup and Lim Inf 


We now define the important concepts of the “lim sup” and the “lim inf” of a real- 
valued sequence. Throughout this discussion, we will assume as a convention that 
the values +00 are allowed as “limit points” of a sequence of real numbers {xx} in 
the following sense: the sequence {xg} will be said to have the limit point +o if 
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{xx} contains a subsequence (Xm ¢x)} that diverges to +00, and to have the limit point 
—oo if {xz} contains a subsequence {x;(x)} that diverges to —oo. In particular, if the 
sequence {xz} itself diverges to +00 (resp. —0o), then we will refer to +00 (resp. 
—oo) as the limit of the sequence {xg}. 

So let a sequence {xx} in R be given. The lim sup of the sequence {x4} is then 
defined as the limit as k —> œ of the sequence {ax} defined by 


ak = SUP(Xk, XE Xk4+20---} ay lim kow Ak 


and is usually abbreviated as lim~o SUP/>k Xi, OF in more compact notation,.as 
simply lim SUPk> 00 Xk: To see that the lim sup is always well defined, note that there 
are only three possibilities that arise concerning the sequence {ax}. First, it could be 
that ag = +00 for some k, in which case a, = +00 for all k. If this case is ruled 
out (so ax is finite for each k), the sequence {ay} must satisfy agı < ak for all k, 
since ax41 is the supremum over a smaller set; that is, {ag} must be a nonincreasing 
sequence in R. By Corollary 1.16, this specifies the only two remaining possibilities: 
either {ag} is unbounded, in which case a, | —oo, or {ax} is bounded, in which case 
{ax} converges to a limit a in R. In all three cases, therefore, the limit of ag is well 
defined (given our convention that +00 may be limiting values), and this limit is, of 
course, lim SUP; o Xk- Thus, the lim sup of any real-valued sequence always exists. 

In a similar vein, the lim inf of the sequence {x4} is defined as the limit as k -> co 
of {bg}, where 


bk = inf (xn, Xk41s Xk42,---}, a lim EEN co 


and is usually denoted limk» oo inf j>% x}, of just lim inf x-» oo xx. Once again, either 


by = ~—00 for some k (in which case b} = —60 for all k), or {by} i is a nondecreasing 
sequence in R. Therefore, lim infk— oo xk, is always well defined for any real-valued 
sequence {xz}, although it could also take on infinite values. 

The following result establishes two important facts: first, that the lim sup and 
lim inf of a sequence {xg} are themselves limit points of the sequence {x,}, and 
second, that the lim sup is actually the supremum of the set of limit points of {x,}. 
and the liminf is the infimum of this set. This second part gives us an alternative 
interpretation of the lim sup and lim inf. 


Theorem 1.17 Let {xk} be a real-valued sequence, and let A denote the set of all 
limit points of {xx} (including +00 if {xx} contains such divergent subsequences). 
For notational convenience, let a = lim supy_, 4, Xk and b = lim infoo Xk. Then: 


1. There exist subsequences m(k) and l(k) of k such that Xm(ky) > a and 
Xk) — b. 


2. a = sup A and b = inf A. 
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Remark Note that the first part of the theorem does not follow from the second, 
since there exist sets which do not contain their suprema and infima. 


Proof We prove the results for the lim sup. The arguments for the lim inf are anal- 
ogous; the details are left to the reader. We consider, in turn, the cases where a is 
finite, a = +00, and a = —0o0. 

We first consider the case where a is finite. We will show that for any € > 0, an 
infinite number of terms of the sequence {xg} lie in the interval (a — €, a + €). By 
Theorem 1.9, this will establish that a is a limit point of {xx}. 

Suppose that for some € > 0, only a finite number of the terms of the sequence 
{xx} lie in the interval (a — €,a + €). We will show that a contradiction must result. 
If x € (a — €, a + €) for only finitely many k, there must exist K sufficiently large 
such that for all k > K, 


xk ¢(a—€,at+€). 


We will show that for any k > K, we must now have jag — a| > €, where, as in the 
definition of the lim sup, a, = sup{x,, x44, ...}. This violates the definition of a as 
the limit of the sequence {az}, and provides the required contradiction. 

So pick any k > K. For any j > k, there are only two possibilities: either 
xj 2a+e€,orx; < a -— e. Therefore, a, must satisfy either (i) a, > a + € (which 
happens if there exists at least one j > k such that x; > a + €), or (b) ak < a — € 
(which occurs if xj < a —€ forall j > k). In either case, |a — a| > €, as required. 
This completes the proof that a is a limit point of {xz} when a is finite. 

Now suppose a = +00. This is possible only if a, = +00 forall k.t Buta, = +00 
implies 


sup{x1, X2,...} = +00, 


and from the definition of the supremum, this is possible only if for all positive 
integers p, there is k(p) such that xkgp) = p. Of course, this is the same as saying 
that there is a subsequence of {xx} which diverges to +00. It therefore follows that 
a isa limit point of {xx} in this case also. 

Finally, suppose a = —oo. This is possible only if the nonincreasing sequence 
{ax} is unbounded below, i.e., that for any positive integer p, there is k(p) such that 
ay < —p forall k > k(p). But ag is the supremum of the set (xx, x441,---}, SO this 
implies in turn that for all k > k(p), we must have x, < — p. This states precisely 
that the sequence {xz} itself diverges to —00, so of course a is a limit point of {xx} 
in this case also. This completes the proof of part 1 that a € A. 


4If ay is finite for some k, then a; must also be finite for any j > k, because a; is the supremum over a 
smaller set. 
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Part 2 will be proved if we çan show that a is an upper bound of the set of lmit 
points A: we have already shown that a € A, and if a is also an upper bound of A, 
we must clearly have a = sup A. Jf a = +00, it is trivial that a > x must hold for 
all x € A. If a = —00, we have shown that the sequence {x4} itself must diverge 
tc --00, so A consists of only the point a, and vacuously therefore, a > x for all 
AEA, 


This leaves the case where a is finite. Let x € A. Then, by definition of A. there 
exists a subsequence {x,.(4)} such that xmo) —> x. NOW, dmx) as the supremum of the 
SEt {Xm(k): Am (k)-+ 1s Xm(k) +2, - - -} Clearly satisfies amk) Z Xmxy, and since ay — ay 
it is also evidently true that amx) > a. Therefore, 


a= lim amk) = MM Xm(ky = X, 
k= k> 
and the proof is complete. a] 


It is a trivial consequence of Theorem 1.17 that lim sup, _, oo Xk > Jim infk-» oo Xk 
for any sequence {xx}. (This could also have been established directly from the 
definitions, since ag > by for each k.) Strict inequality is, of course, possible: the 
sequence {x,} = (0,1,0,1,0,1,...} has lim sup, Qo xk = Land lim infy ne Xk = 
0. Indeed, the only situation in which equality obtains is identified in the following 
result: 


Theorem 1.18 A sequence {x,} in R converges to a limit x € R if and only if 
lim supp_,og Xk = liminfgsoo x, = x. Equivalently, {xx} converges to x if ancl 
only if every subsequence of {xz} converges to x. 


Proof From the second part of Theorem 1.17, lim sup;_, 4, x4 is the supremum, and 
lim inf x— co xk the infimum, of the set of limit points of {x}. So lim supy_. 95 Xk = 
lim inf x oo x implies that {x} has only one limit point, and therefore that it con- 
verges. Conversely, if lim sup, oo Xk > lim Infk oo Xx, then the sequence {xg} has 
at least two limit points by the first part of Theorem 1.17. i) 


Finally, the following result is obtained by combining Theorems 1.18 and 1.7: 


Corollary 1.19 A sequence {x,} in R” converges to a limit x if and only if every 
subsequence of x; converges to x. 


The Exercises contain some other useful properties of the lim sup and lim inf. 
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1.2.7 Open Balls, Open Sets, Closed Sets 
Let x € R”. The open ball B(x, r) with center xand radius r > 0 is given by 


l B(x,r) = (ye R" | d(x, y) <r}. 


In words, B(x, r) is the set of points in R” whose distance from x is strictly less than 
r. If we replace the strict inequality with a weak inequality (d(x, y) < r), then we 
obtain the closed ball B(x, r). 

A set S in R” is said to be open if for all x € S, there is an r > O such that 
B(x,r) C S. Intuitively, Sis open if given any x € S, one can move a small distance 
away from x in any direction without leaving S. 

A set S in R” is said to be closed if its complement SC = {x € R” | x ¢ S$) 
is open. An equivalent—and perhaps more intuitive—definition is provided by the 
following result, using the notion of convergent sequences. Roughly put, it says that 
a set S is closed if and only if any sequence formed by using only points of S cannot 
“escape” from S. 


Theorem 1.20 A set S C R” is closed if and only if for all sequenees {xx} such that 
xk € S for each k and xp —> x, it is the case that x € S. 


Proof Suppose S is closed. Let {x,} be a sequence in S and suppose x, —> x. We 
are to show that x € S. Suppose x ¢ S, i.e., x e S°. Then, since S is closed, S° 
must be open, so there exists r > 0 such that B(x,r) C S°. On the other hand, 
by definition of x, — x, there must exist k(r) such that for all k > k(r), we have 
d (xx, x) < r, i.e., such that x, € B(x,r) C S°. This contradicts the hypothesis that 
{xx} is a sequence in S, which proves the result. 

Now suppose that for all sequences {xg} in S such that x, — x, we have x € S. 
We will show that S must be closed. If S is not closed, then S° is not open: For the 
openness of S° to fail, there must exist a point x € S° such that no open ball with 
center x is ever completely contained in Sf, i.e., such that every open ball with center 
x and radius r > O has at least one point x(r) that is notin S°. Fork = 1,2,3,..., 
define rk = 1/k, and let x, = x(rg). Then, by construction x, ¢ S° for each k, 
so xz E€ S for cach k. Moreover, since xx € B(x, rx) for each k, it is the case that 
(xg. x) < rg = l/k,soxg — x. But this implies that x must bein Sy a contradiction. 

0 


Among the commonly encountered closed and open sets are the “closed unit 
interval” [0,1] defined as {x e R | O < x < 1}, and the “open unit interval” 
(0, 1) defined as {x e R | O < x < 1}. Observe that there exist sets that are 
neither open nor closed such as the intervals (0,1) = {x e R] O <x < 1}, and 
(0, 1J= {x E€RIO<x <1). 
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1.2.8 Botinded Sets and Compact Sets 


A set S in R” is said to be bounded if there exists r > 0O such that S C B(O.r), 
that is, if S is completely contained in some open ball in R” centered at the origin . 
For instance, the interval (0, 1) is a bounded subset of R, but the set of integers 
{1, 2,3, ...} is not. 

A set S C R” is said to be compact if for all sequences of points {xx} such that 
xk E S for each k, there exists a subsequence (xm )} Of {x4} and a point x € 5 
such that xm(k) —> x. In words, this definition is abbreviated as “a set is compacti f 
every sequence contains a convergent subsequence.” If S C R” is compact, it is easy 
to see that S must be bounded. For, if S were unbounded, it would be possible «> 
pick a sequence {x} in S such that |x, || > k, and such a sequence cannot contain 
a convergent subsequence (why?). Similarly, if S is compact, it must also be closed. 
If not, there would exist a sequence {xg} in S such that x, — x, where x ¢ S. 
All subsequences of this sequence then also converge to x, and since x ¢ S, the 
definition of compactness is violated. Thus, every compact set in R” must be close d 
and bounded. 

The following result establishes that the converse of this statement is also true. It 
is a particularly useful result since it gives us an easy way of identifying compict 
sets in R”. 


Theorem 1.21 A set S C R" is compact if and only if it is closed and bounded. 


Proof See Rudin (1976, Theorem 2.41, p.40). (08) 


1.2.9 Convex Combinations and Convex Sets 
Given any finite collection of points x4, ..., Xm € R”, a point z € R” is said to be 
a convex combination of the points (x), ... , Xm) if there exists A € R” satisfying 
(i) A; 2 0,7 = 1,...,m,and (ii) Dja Ay = 1, such that z = DL; Aixi. 
A set S C R” is convex if the convex combination of any two points in S$ is al so 


in S. Intuitively, S is convex if the straight line joining any two points in S is itself 
completely contained in S. 


For example, the closed and open unit intervals (0, 1) and [0, 1] are both convex 
subsets of R, while the unit disk 


D = {x € R? | Ix <1) 
is a convex subset of R?. On the other hand, the unit circle 


C= {x € R? | |x = 1} 
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Fig. 1.3. Convex and Nonconvex Sets 


is not a convex subset of R? since (—1,0) € C and (1,0) € C, but the convex 
combination (0,0) ¢ C. Additional examples of convex and nonconvex sets are 
given in Figure 1.3. 


1.2.10 Unions, Intersections, and Other Binary Operations 


This section summarizes some important properties of open, closed, compact, and 
convex sets. While most of the properties listed here are used elsewhere in the book, 
some are presented solely for their illustrative value. 

Some definitions first. A set A is said to index a collection of sets S in R” if a set 
Sy € S is specified for each œ € A, and each S € S corresponds to some a € A. 
We will denote such a collection by (Sy)ae4, or when A is understood, simply by 
(Sa). When the index set A consists of a finite number of elements, we will call it 
a finite index ser; if there are no restrictions on the number of elements in A, we 
will call it an arbitrary index set. If B C A, then the collection (Sg) seg is called a 
subcollection of (Sy Jae a. If B consists of only a finite number of elements, (Sg) ge8 
is called a finite subcollection. For notational clarity, we will use (Sp)ye Fr to denote 
a finite subcollection, and retain (Sg)geg to denote an arbitrary subcollection. 

Given any collection of sets (Sy) indexed by A, recall that the union of the col- 
lection (Sg), denoted Uge 4 Sy, and the intersection of the collection (Sg), denoted 
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Nee A Se, are defined as 


Ured Sa = {x € R” |x € Sy for some «æ € A} 
NDacd Sa = (x € R" | x € Sy forall æ € A}. 


The following pair of identities, known as DeMorgan’s laws, relates unions, in- 
tersections, and complementation, 


Theorem 1.22 (DeMorgan’s Laws) Ler A be an arbitrary index set, and let (Gq) 
be a collection of sets in R” indexed by A. Then, 


l. (UgesGa)® = Naver Gy , and 
2. (NaeAGa)® = Uses GS. 


Proof We prove part 1 here. Part 2 may be established analogously, and is left as 
an exercise. We first show that each element of the set on the left-hand side (LHS) 
of (1) is also an element of the right-hand side (RHS), thereby establishing LHS C 
RHS. Then, we show that RHS C LHS also, completing the proof. 

So suppose x € LHS. Then, by definition, x ¢ Ga for any œ € A, sox € GE for 
all a, and, therefore x e RHS. So LHS C RHS. 

Now suppose x € RHS. Then x € G§@ for alla, orx ¢ Ga for anya € A. It 
follows that x e LHS. Thus we also haye RHS C LHS. Oo 


The next set of results deals with conditions under which closedness and openness 
are preserved under unions and intersections. 


Theorem 1.23 Let A be an arbitrary index set. Suppose for each a € A, Gg is an 
open set. Then, Uae AGe is also open. That is, the arbitrary union of open sets is 
open. 


Proof Suppose x € Uge4Ga. Then, x € Ga for some a. Since Gg is open, there is 
r > 0 such that B(x,r) C Ga C UgeaGa. (J 


Theorem 1.24 Ler A be an arbitrary index set. Suppose for each a € A, Hy is a 
clesed set. Then, Nae a Ha is closed. That is, the arbitrary intersection of closed sets 
is closed. 


Proof By definition, Newes Ha is closed if and only if (Nac4Ha)® is open. By 
DeMorgan’s laws, (Nae A Ha)? = Urea Hg. For eacha € A, HE is open since Ha is 
closed. Now use Theorem 1.23. o 
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Theorem 1.25 Let Gj, ..., Gi be open sets, where l is some positive integer. Then, 
a Gi is open. That is, the finite intersection of open sets is open. 


Proof Letx € nM Gi. We will show that there isr > O such that B(x,r) C 
gen G;. Since x is an arbitrary point in the intersection, the proof will be complete. 

Since x is in the intersection, we must have x €e G; for each i. Since G; is open, 
there exists r; > O such that B(x,7;) C Gr. Let r denote the minimum of the finite 
set of numbers r;,..., 77. Clearly, r > 0. Moreover, B(x,r) C B(x, r;) for each i, 
so B(x,r) C G; foreach i. Therefore, B(x,r) C N Gi o 


Theorem 1.26 Let H,,..., Hı be closed sets, where l is some positive integer. Then, 
Uli H; is closed. That is, the finite union of closed sets is closed. 


Proof This is an immediate consequence of Theorem 1.25 and DeMorgan’s laws. 
o 


Unlike Theorems 1.23 and 1.24, neither Theorem 1.25 nor Theorem 1.26 is valid 
if we allow for arbitrary (i.e., possibly infinite) intersections and unions respectively. 
Consider the following counterexample: 


Example 1.27 Let A = {1,2,3,...}. For each œ € A, let Gy be the open interval 
(0,1 + 1/a), and Hy be the closed interval [0, 1 — 1/aJ. Then, Nge4Ge = (0, 1], 
which is not open, while Uge4 Hy = [0, 1), which is not closed. o 


Since compact sets are necessarily closed, all of the properties we have identified 
here for closed sets also apply to compact sets. The next collection of results describes 
some properties that are special to compact sets. Given a subcollection (Sg) ge 
of a collection (Sy)veA, say that the subcollection has a nonempty intersection if 
“peas # 8. 


Theorem 1.28 Suppose (Sy)aea isa collection of compact sets in R” such that every 


finite subcollection (Sp)peF has nonempty intersection. Then the entire collection 
has nonempty intersection, i.e., NweA Sa Æ Ô. 


Proof See Rudin (1976, Theorem 2.36, p.38). o 


As an immediate corollary, we have the result that every nested sequence of com- 
pact sets has a nonempty intersection: 
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Corollary 1.29 Suppose (S_)xen'is a collection of compact sets in R” that is nested, 
i.e., such that Sk41 C Sy for each k. Then OZ, Sk + @. 


Compactness is essential in Theorem 1.28 and Corollary 1.29. The following 
examples show that the results may fail if the sets are allowed to be noncompact. 


Example 1.30 For k € N, let Sk = [k, 00). Then, Sk is noncompact for each & (it is 
closed but not bounded). The collection (S;)xen is clearly nested since Sk+1 C Sy 
for each k. However, Nf, Sk is empty. To see this, consider any x € R. If k is any 
integer larger than x, then x ¢ Sk, so x cannot be in the intersection. B 


Example 1.31 Let S = (0, 1/k), k € N. Once again, S, is noncompact for each 
k (this time it is bounded but not closed). Since 1/k > 1/(k + 1), the collectiom 
(Sk)ken is evidently nested. But NRE | Sk is again empty. Forif x € (0, 1), x ¢ Sy for 
any k satisfying kx > 1; while if x < Oorx > 1,x ¢ Są for any k. ca) 


Next, given sets S1, S$ in R”, define their sum Sı + S by: 


S| + Sy = {x Ee R” | x =x; +22, xi € Sj), x2 € Sy}. 
Theorem 1.32 If Sı and S2 are compact, so is Sı + S2. 


Proof Let {x} be a sequence in Sı + S2. We will show that it has a convergent 
subsequence, i.e., that there is a point x € Sı + S2 and a subsequence /(k) of k such 
that xik) —> x. 
For each k, there must exist points yg € Sı and zx E€ S2 such that x, = yk + 7. 
Since S, is compact, there is a subsequence m(k) of k such that ym) > y € $. 
Now {Zm(k)}, as a subsequence of {zg}, itself defines a sequence in the compact set 
S2. Thus, there must exist a further subsequence of m (k) (denoted say, /(A)) such 
that Zik) —> z € Sp. Since /(k) is a subsequence of m (k) and ymcx) > y, itis clearly 
the case that yi) —> y. Therefore, xik) = yick) + Ziçk) is a subsequence of {xx} that 
converges to y + z. Since y € Sı and z € So, (y + z) € Sı + S2, so we have shown 
the existence of the required subsequence. m 


The following example, which shows that the word “compact” in Theorem 1.32 


cannot be replaced with “closed,” provides another illustration of the power of com- 
pactness. 


Example 1.33 Let Sı = {(x, y) € R? | xy = 1} and S = {[(x, y) e R? | xy = 
—1} (see Figure 1.4). Then, Sı and Sz are closed. Foreachk = 1,2, 3,..., (k, 1/A) € 
5, and (—k, 1/k) € $2, so (0, 2/k) € Sı + Sy. This sequence converges to (0,0) 
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Fig. 1.4. The Sum of Closed Sets Need Not Be Closed 


which cannot be in S; + S2. To see this, observe that (0, 0) can be in S} + S2 if and 
only if there is a point of the form (x, y) € Sı such that (—x, — y) € S2. However, 
xy = (—x)(—y), so (x, y) € S implies (~x, —y) € Sı also. o 


The next result describes a property that is equivalent to compactness. A definition 
is required first. A collection of open sets (Sa)aca in R” is said to be an open cover 
of a given set S C R”, if 


Sc Uae dSa- 


The open cover (Suy)we4 Of S is said to admit a finite subcover if there exists a finite 
subcollection (Sy)yer such that 


Sc Upe F Sy- 


It is an elementary matter to construct sets S in R” and open covers (Sy)aea of S, 
such that (Sa)we4 admits no finite subcover. Here are two examples: 


Example 1.34 Let S be the closed unbounded interval (0, 00), and let A = N. For 
k e N, define Sx be the open interval (k —2, k +2). Clearly, (Sk)ken is an open cover 
of § that has no finite subcover. oO 
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Example 1.35 Let S be the bounded open interval (0, 1), and let A = N.-For- 
k e N, let Sy be the open interval (1/(k + 2), 1/4). A little reflection shows that 
Uren Se = (0,1) = S, and therefore that the collection (Sg )sen is an open cover 
of S. The collection (Sk)ken, moreover, admits no finite subcover. For, suppose 


{k,,...,A;} is any finite subset of N. Let k* = max{k;, ...,A,}. If x is such that 
0 <x < 1/(k* +2), thenx ¢ U! Sky so the finite subcollection (Sk haa does not 
cover S. o 


The following result states that examples such as these are impossible if (and only 
if) S is compact. 


Theorem 1.36 A set S in R" is compact if and only if every open cover of S has a 
finite subcover. 


Proof See Rudin (1976, Theorem 2.41, p.40). o 


There is yet another property, called the finite intersection property, that is equiv- 
alent to compactness. This property is described in the Exercises. beg Exerete 34 
Lastly, we turn to some of the properties of convex sets: 


Theorem 1.37 Suppose (SaJæca is a collection of convex sets in R”. Then Duca Sa 
is also convex. 


Proof Let S = Dacs Sa. If xi, x2 € S, then x1, x2 € Sa for every œ € A. Since 
each Sg is convex, any convex combination of x; and x2 is in each Sy, and therefore 
in S. g 


Theorem 1.38 Suppose Sı and Sı are convex sets in R". Then Sy + S) is convex. 


Proof Let S = S; + S2. Pick any x, y € Sand à € (0, 1). Then, there exist points 
xı € Sı and x2 € Sz such that x = xı + x2; and points y; € S; and y2 € Sy such 
that y = yı + y2. Since Sı and Sz are both convex, Ax; + (1 — A)yy € Sı and 
Ax2 + (1 — à)y € Sp. Therefore, 


[Ax + (1 — A)y)) + Ax + (i — å) € sS. 
ButaAx,+(1—A) yj tAxzt(1—A)y2 = Axi +x) +C1—-A) (yi + y2) = Ax+(1-A)y, 
so àx + (1 — à)y € S, as required. o 


On the other hand, the convexity of Sı and $2 obviously has no implications for 
S1 U Sy. For instance, if S, = [0, 1] and S2 = [2, 3], then S; and S2 are both convex, 
but Sı U Sz is not. 


30 Chapter 1 Mathematical Preliminaries 
1.3 Matrices 
1.3.1 Sum, Product, Transpose 


Ann x m matrix A is an array 


ai a2 aim 
a2; an am 
A= i 
Ani Am anm 
where a;j is a real number for each i € {1,..., n} and j € {1,..., m}. We will 


frequently refer to a;; as the (i, j)-th entry of the matrix A. In more compact notation, 
we write the matrix as A = (a;;). The vector 


[ail aim] 
is called the i-th row of the matrix A, and will be denoted Ai» i = 1,...,2. The 
vector ` 
aij 
anj 
is called the j-th column of the matrix A, and will be denoted AS, j = 1,...,m. 


Thus, an n xm matrix A hasn rows and m columns, and may be represented variously 
as 


anl eee Anm A 


If.A and B are two n x m matrices, their sum A + B isthe n x m matrix whose 
(i, j)-th entry is aij + bij: 


an +b ~.. Aim +bim 
AB = 


anı + bni see Anm + bnm 


Observe that A + B is defined only if A and B have the same numbers of rows and 

columns. l 
If Ais ann x m matrix and B is an m x k matrix, their product AB is the n x k 

matrix whose (i, j)-th entry is the inner product of the i-th row 47 of A and the j-th 
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column B; of B: 


Ai BI Al By Ay: By 

AL BC AS BS A5- BE 
AB = 

AAR AÑ- BS A’. BE 


Of course, for any i € {1,...,m} and j € {1,...,k}, AĮ: By = PL, abi. Note 
that for the product A B to be well-defined, the number of columns in A must be the 
same as the number of rows in B. Note also that the product A B is not, in general, 
the same as the product BA. Indeed, if A is ann x m matnx, and B is anm x k 
matrix, then AB is well-defined, but BA is not even defined unless n = k. 


Theorem 1.39 The matrix sum A-+ B and product A B have the following properties: 


1 A+B=B+A. 

2. Addition is associative: (A + B)+C=A+(B+4+C). 

3. Multiplication is associative; (AB)C = A(BC). 

4. Multiplication distributes over addition: A(B + C) = AB + AC. 


Proof Immediate from the definitions. a) 
The transpose of a matrix A, denoted A’, is the matrix whose (i, j)-th entry is aji. 


Thus, if A is ann x m matrix, A’ isan m x n matrix. 
For example, if A is the 3 x 2 matrix 


ai) a12 
421 a22 
43) 433 


then A’ is the 2 x 3 matrix 


ail a2 434 
aá]? a22 32 ; 
It is easy to check from the definitions that the transpose has the following prop- 
erties with respect to the sum and product. (For the second property, note that the 


product B’A’ is well-defined whenever the product AB is well defined and vice 
versa.) 


1. (A+ BY = Al + B’. 
2. (ABY = BA’, 
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Finally, a word on notation. It is customary in linear algebra to regard vectors 
x € IR” as column vectors, i.e., a8 n x | matrices, and to write x’ when one wishes to 
represent x as a row vector, i.e., aS a l x n matrix. In particular, under this convention, 
we would write x’y to denote the inner product of the vectors x and y, rather than 
x- yas we have chosen to do. In an abuse of notation, we will continue to use x - y 
to denote the inner product, but will, in the context of matrix multiplication, regard 
x as a column vector. Thus, for instance, we will use Ax to denote the product of 
the m x n matrix A and the vector x € R”. Similarly, we will use x’ A to denote the 
pre-multiplication of the n x A matrix A by the vector x. 


1.3.2 Some Important Classes of Matrices 


The object of this subsection is to single out some important classes of matrices. 
Properties of these matrices are highlighted in the succeeding sections on rank, de- 
terminants, and inverses. 


Square Matrix 


A square matrix is ann x m matrix A for which n = m (i.e., the number of rows 
and columns are the same). The common value of n and m for a square matrix A 
is called the order of the matrix. Given a square matrix A = (a;;) of order n, the 
elements a;;,i = 1,...n, are called the diagonal entries of A, and the elements a; ; 
fori Æ j are called the off-diagonal elements. 


Symmetric Matrix 


A square matrix A of order n is called a symmetric matrix if for alli, j € {1,..., n}, 
we have aj; = aji. Observe that A is a symmetric matrix if and only if it coincides 
with its transpose, i.e., if and only if we have A = A’. 


Diagonal Matrix 


A diagonal matrix D of order n isa square matrix of order n, all of whose off-diagonal 
entries are zero: 


di1 0 Sis 0 

0 dn 0 
D = . 

0 0 dan 


Note that every diagonal matrix is also symmetric. 
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` Identity Matrix 


The identity matrix of order n is a square n x n matrix whose diagonal entries are all 
equal to unity, and whose off-diagonal entries are all zero: 


J) 0O.... O 

O 1... 0 
i= 

0 0... I 


The identity matrix is a diagonal matrix (and, therefore, also a symmetric matrix). In 
addition, it has the property that if A and B are any & x n andn x m matrices, then 
Al = A and I B = B. In particular, therefore, 1? = / x / = /. 


Lower-Triangular Matrix 


A lower triangular matrix of order n ìs a square matrix D of order n which has the 
property that all the entries above the diagonal are zero: 


ay 0 fee 0 
a> dzz eee 0 
D = : . ` . 
dn\ dn2 tee dnan 


Upper-triangular Matrix 


An upper-triangular matrix of order n is a square matrix of order n which has the 
property that all the entries below the diagonal are zero: 


di) d ... din 
0 d22 ise Gp 
D=]| . 3 ; , 
0 0 ba iden 


Note that the transpose of an upper-triangular matrix is a lower-triangular matrix, 
and vice versa. 


1.3.3 Rank of a Matrix 


Let a finite collection of vectors x,,..., xg in R” be given. The vectors x}, ..., X4 
are said to be linearly dependent if there exist real numbers a}, ... œx, with a; 4 O 
for some i, such that 


yxy +--+ apx, =Q. 
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If, on the other hand, the only solution to ajx; + --- + agxy =Oisa, = = 
ak = 0, the vectors x1,..., xk are said to be linearly independent. 
For example, the vectors x, y, and z given by 


RORE 


are linearly dependent since x + 3y — 2z = 0. On the other hand, the following 
vectors x, y, and z are obviously linearly independent: 


l 0 0 
x=|/0] p=] 1] z=] 0 
0 0 1 


Let A be ann x m matrix. Then, each of the n rows of A defines a vector in R”. 
The row rank of A, denoted p” (A), is defined to be the maximum number of linearly 
independent rows of A. That is, the row rank of 4 is k if the following conditions 
both hold: 


1. There is some subset (/;,..., /%) of k distinct integers of the sat {1, ..., n} such 
that the set of vectors A} ,..., A} are linearly independent. 

2. For all selections of (k + 1) distinct integers (/,,...,/441) from {1,..., n}, the 
vectors Ah? seis Aai are linearly dependent. 


Note that if k = n, the second condition is redundant, since it is not possible to select 
(n + 1) distinct integers from a set consisting of only n integers. 

Similarly, each of the m columns of A defines a vector in R” , and the column rank 
of A, denoted p° (4), is defined as the maximum number of linearly independent 
columns in A. Since A is ann x m matrix, we must have p” (4) < nand p°(A) <m. 
Among the most useful results in linear algebra is: 


Theorem 1.40 The row rank p"(A) ofanyn xm matrix A coincides with its column 
rank pf (A). 


Proof See Munkres (1964, Corollarly 7.3, p.27) or Johnston (1984, Chapter 4, 
p.115). o 


In the sequel, we will denote this common value of p” (4) and p°(A) by p(A). 
Note that since o”(A) < n and p°(A) < m, we must have p(A) < min{m,n)}. 
An immediate consequence of Theorem 1.40, which we present as a corollary for 
emphasis, is that the rank of a matrix coincides with the rank of its transpose: 


Corollary 1.41 The rank p(A) of any matrix A is equal to the rank p(A’) of its 
transpose A’. 
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Our next result lists some important transformations of a matrix which leave ifs 
rank unchanged. 


Theorem 1.42 Let A be a given n x m matrix. If B isann x m matrix obtained 
from A 


1. by interchanging any two rows of A, or 
2. by multiplying each entry in a given row by a nonzero constant a, or 


3. by replacing a given row, say the i-th by itself plus a scalar multiple a of some 
other row, say the j-th, 


the rank p(B) of B is the same as the rank p(A) of A. The same result is true if the 
word “row” in each of the operations above is replaced by “column.” 


Proof See Munkres (1964, Theorem 5.1, p.15). im 


The three operations described in Theorem 1.42 are termed the “elementary row 
operations.” They play a significant role in solving systems of linear equations by 
the method of transforming the coefficient matrix to reduced row-echelon form (see 
` Munkres, 1964, for details). 

Finally, an m x n matrix A is said to be of full rank if p(A) = min{m,n}. In 
particular, a square matrix A of order n is said to have full rank if p(A) =n. 


Thecrem 1.43 Let A be anm x n matrix. 


1. If B is anyn x k matrix, then p(AB) < min{p(A), p(B)}. 


2. Suppose P and Q are square matrices of orders m and n, respectively, that are 
both of full rank. Then, p(P A) = p(AQ) = p(PAQ) = p(A). 


Proof See Johnston (1984, Chapter 4, p.122). Q 


1.3.4 The Determinant 


Throughout this subsection, it will be assumed that A is a square matrix of order n. 
The determinant is a function that assigns to every square matrix A, areal number 
denoted | A]. The formal definition of this function is somewhat complicated. We 
present it in several steps. 
Given the finite set of integers {1,..., n}, a permutation of this set is a rewriting 
of these n integers in some order, say ji, ..., jn. Fix a permutation, and pick any jj. 
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Count the number of integers that follow j; in the ordering j,..., jn, but that pre- 
cede it in the natural order 1, ...2. This number is the number of inversions caused 
by ji. When we determine the number of inversions caused by each j; and sum them 
up, we obtain the total number of inversions in the permutation j,,..., jn. H this 
total is an odd number, we call the permutation an odd permutation; if the total is 
even, we call the permutation an even permutation. 

For example, consider the permutation (4, 2, 3, 1} of the integers {1, 2, 3,4}. The 
number of inversions caused by the number 4 in the permutation is 3, since all three 
numbers that follow it in the permutation also precede it in the natural ordering. 
The number 2 in the permutation causes only one inversion, since only the integer | 
follows itin the permutation and also precedes it in the natural ordering. Similarly, the 
number 3 also causes only one inversion, and, finally, the number | evidently causes 
no inversions. Thus, the total number of inversions in the permutation {4, 2, 3, 1} is 
3+1+ 1 =5, making it an odd permutation. 

Now let an xn matrix A = (a;;) be given. Let j,..., jn denote any permutation 
of the integers [1,...,}, and consider the vector (a; /,,..., @nj,). Note that this 
vector consists of one entry from each row of A, and that no two,entries lie in the 
same column. Take the product of these entries 


AY fyF2j2 > °° Anja 


Prefix a + sign to this product if the permutation /;,..., jn is an odd permutation 
of {1,..., n}, and a — sign if the permutation is even. 

When all possible such expressions a; j, ---@nj, are written down (prefixed with 
the appropriate + sign), the sum of these expressions gives us a number, which is 
the determinant of A, and which we have denoted | 4]. 

For example, given the 2 x 2 matrix 


aji an 
A= 
a2) a22 
we have |A| = a11422 — a214 12. The expression for the determinant of a 3 x 3 matrix 
is more complicated. If 


then 


|4| = 411422433 — 411423432 — 412421033 
+ 412423431 + 413421432 — 413422831. 


1.3 Matrices ` 37 


For higher order matrices, the €xpressions get progressively messier: there are! = 
24 permutations of the set {1, 2, 3, 4}, so there are 24 expressions in the determinant 
of a 4 x 4 matrix, while the number for a5 x 5 matrix is 120. 

It is evident that the definition of the determinant we have provided is not of 
much use from a computational standpoint. In a later subsection, we offer some 
easier procedures for calculating determinants. These procedures are based on the 
following very useful properties: 


Theorem 1.44 Let A be a square matrix of order n. 


1. If the matrix B is obtained from the matrix A by interchanging any two rows of 
A, then |B| = -l Al. 

2. If B is obtained from A by multiplying each entry of some given row of A by the 
nonzero constant a, then |B| = æl A|. 

3. If B is obtained from A by replacing some row of A (say row i) by row i plus a 
times row j, then |B| = | AÌ. 

4. If A has a row of zeros, then |A| = 0. 

5. If A is a lower-triangular (or upper-triangular) matrix of order n, then the 
determinant of A is the product of the diagonal terms: |A| = aj} +++ ann. In 
particular, the determinant of the identity matrix is unity: | {| = 1. 


Each of the first four properties remains valid if the word “row” is replaced by 
“column.” 


Proof See Munkres (1964, Chapter 8). ae) 


The notion of the determinant is closely related to that of the rank of a matrix. To 
state this relationship requires one further definition. Given a (possibly nonsquare) 
n x m matrix B, any matrix obtained from B by deleting some of its rows and/or 
columns is called a submatrix of B. For instance, if 


by bn bna 
bı bn bz 
b3, b32 b33 
ba, ba ba 


then the following are all submatrices of B: 


bi bn bg 

b b 
[bai ] a ee ba bn baz 
bar bas b3 b32 bzg 
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Of course, not all submatrices of B need be square. The following, for instance, is 
not: 


bu bn 
ba bn 
bar ba 


Theorem 1.45 Let B beann x m matrix. Let k be the order of the largest square 
submatrix of B whose determinant is nonzero. Then, p(B) = k. In particular, the 
rows of B are linearly independent if and only if B contains some square submatrix 
of order n whose determinant is nonzero. 


Proof See Munkres (1964, Theorem 8.1). o 


A special case of this result is that square matrices have full rank if and only if 
they have a non-zero determinant: 


Corollary 1.46 A square matrix A of order n has rank n if, and only if, |A| #0. 


1.3.5 The Inverse 


Let a square n x n matrix A be given. The inverse of A, denoted A`!, is defined to 
beann x n matrix B which has the property that 4B = J. 


Theorem 1.47 Let A beann x n matrix. 


1. A necessary and sufficient condition for A to have an inverse is that A have rank 
n, or, equivalently, that |A| # 0. 


2. If A7! exists, it is unique, i.e., two different matrices B and C cannot both be 
the inverse of a given matrix A. 


Proof See Johnston (1984, Chapter 4). D 


It is possible to describe a procedure for constructing the inverse of a square matrix 
A which satisfies the condition that |A] # 0. Consider first the (n — 1) x (n — 1) 
submatrix of A that is obtained when row i and column j are deleted. Denote the 
determinant of this submatrix by | A(i/)|. Define 
Cij(A) = (CDAC). 


C;;(A) is called the (i, j)-th cofactor of A. 
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Now, construct ann xn matrix C (4) Whose (i, j)-th entry is C;;(A). The transpose ._ 
of C(A) is called the adjoint of A, and is denoted Adj(A): 


Ci(A) © Cai (A) 
Cin A) > Cnn (A) 


Finally, when each element of Adj(A) is divided by |A], the result is A 71. Thus, 


l 
AT! = —Adj(A). 
a 


We list some useful properties of the inverse in our next result. In stating these 
properties, it is assumed in each case that the relevant inverse is well defined. 


Theorem 1.48 The inverse has the following properties: 


1. The inverse of A! is A: (A7!)7! = A. 
2. The inverse of the transpose is the transpose of the inverse: (A’)~'! = (A7SY’. 
3, (AB)! = Bot An" 
4. AT = 4A]. 
5. The inverse of a lower- (resp. upper-) triangular matrix is also a lower- (resp. 
upper-) triangular matrix. 
Proof See Johnston (1984, Chapter 4, pp.133-135). o 


1.3.6 Calculating the Determinant 


We offer in this subsection two methods for calculating the determinant of a square 


matrix 4. Each method is easier to use than directly following the definition of the 
determinant. 


The first method is based on the observation that, from the definition of the deter- 
minant, the expression for | A| for any given n x n matrix A can be written as 


JA] = aCi (A) +--+ + ain Cin (A). 


Here, as in the previous subsection, C;;(A) is the (i, j)-th cofactor of the matrix A, 
Le., 


Ci (A) = (DŻ AG, 


where A (ij) is the (n — 1) x (n — 1) matrix obtained from A by deleting row i and 
column j. 


40 : Chapter | Mathematical Preliminaries 


This observation gives us a recursive method for computing the determinant, since 
it enables us to express the determinant of an n x n matrix A in terms of the deter- 
minants of the (n — 1) x (n — 1) submatrices A(11),..., A(1n). By reapplying the 
method, we can express the determinants of each of these (n — 1) x (n — 1) subma- 
trices in terms of some of the (n — 2) x (n — 2) submatrices of these submatrices, 
and so on. Thus, for example, for a 3 x 3 matrix A = (a;;), we obtain: 


ele Tee an a3 a2) 423 a2) 422 
a2) a22 23. | = ay] — aj2 +a . 
a32 033 a3, a33 a3 432 
a3} 032 033 
For example, if 
1 5 7 
A=/13 2 8], 
6 1 9 


then, we have 
|A| = 1(18 — 8) — 5(27 — 48) + 73 — 12) = 10 + 105 — 63 = 52. 


Nothing in this method relies on using the first row of the matrix. Any row (or, for 
that matter, column) would do as well; in fact, we have 


|A] = aj Cj, + +++ +GinCin 


ay jCrj Hess + anjCnj. 


1l 


The second method is based on exploiting the following three properties of the 
determinant: 


1. If B is obtained from 4 by interchanging any two rows (or columns), then 
|B| = —}Al. 

2. If B is obtained from A by replacing any row i (resp. column i) by itself plus @ 
times some other row j (resp. some other column j), then |B| = | A|. 

3. If A is a (lower- or upper-) triangular matrix, then the determinant of A is the 
product of its diagonal elements. 


Specifically, the idea is to repeatedly use properties 1 and 2 to convert a given matrix 
A intoa triangular matrix B, and then use property 3. This method is easier to use than 
the earlier one on large matrices (say, order 4 or larger). Nonetheless, we illustrate 
its use through the same 3 x 3 matrix used to illustrate the earlier method: 


1 5 7 
A=|3 2 8 
6 1 9 
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We have: 


1 5 7 j 5 gi l 5 7 l 5 7 
3 2 8gj|=]0 -13 -13}/=)0 ~13 -13|=j0 -13 —-I3). 
6 1 9 6 l 9 0 -29 -33 0 0 —4 


where the second expression is obtained from the first by replacing row 2 by itself 
plus (—3) times row 1; the third is obtained from the second by replacing row 3 by 
itself plus (—6) times row 1; and, finally, the last expression is obtained from the 
third by replacing row 3 by itself plus (—29/13) times row 2. Since the last is in 
triangular form, the determinant of A is seen to be given by (1}(—13)(—4) = 52. 


1.4 Functions 


Let S, T be subsets of R” and R’, respectively. A function f from S to T, denoted 
f: S — T,is a rule that associates with each element of S, one and only one element 
of T. The set S is called the domain of the function f, and the set T its range. 


1.4.1 Continuous Functions 


Let f:S — T, where S C R” and T C R’. Then, f is said to be continuous at 
x € Sif for all € > O, there is 5 > 0 such that y € S and d(x, y) < ô implies 
d( f(x), f(y)) < €. (Note that d(x, y) is the distance between x and y in R”, while 
d( f(x), f(y)) is the distance in R'.) In the language of sequences, f:S > T 
is Continuous at x € S if for all sequences {x} such that x, € S for all k, and 
limp 00 Xk = x, it is the case that limk oo S(x) = f(x). 

Intuitively, f is continuous at x if the value of f at any point » that is “close” to x 
is a good approximation of the value of f at x. Thus, the identity function f(x) = x 
for all x € R is continuous at each x € R, while the function f: R — R given by 
0, x <0 

eae I; x >0 

is continuous everywhere except at x = 0. At x = 0, every open ball B(x, 6) with 
center x and radius 5 > 0 contains at least one point y > 0. At all such points, 
f(y) = 1 > 0 = f(x), and this approximation does not get better, no matter how 
close y gets to x (i.e., no matter how small we take ô to be). 

A function f: S — T is said to be continuous on S if itis continuous at each point 
in S. 

Observe that if f: 5 c R” — R’, then f consists of / “component functions” 
feia fD, i.e., there are functions fi: S > R,i=1,..., 7, such that for each 
x € S, we have f(x) = (f! ies fi). It is left to the reader as an exercise to 
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show that f is continuous at x € S (resp. f is continuous on S) if and only if each 
fÍ is continuous at x (resp. if and only if each f' is continuous on S). 

The following result gives us an equivalent characterization of continuity that 
comes in quite useful in practice: 


Theorem 1.49 A function f: S c R” — R is continuous at a point x € S if and 
only if for all open sets V C R’ such that f(x) € V, there is an open set U C R" 
such that x € U, and f(z) € V forallz E€ UNS. 


Proof Suppose f is continuous at x, and V is an open set in R’ containing f(x). 
Suppose, per absurdum, that the theorem was false, so for any open set U containing 
x, there is y € U N S such that f(y) ¢ V. We will show a contradiction. For 
k € {1,2,3,...}, let Uz be the open ball with center x and radius 1/k. Let yg € URNS 
be such that f(x) £ V . The sequence {yg} is clearly well defined, and since yk € Uk 
for all k, we have d(x, yk) < 1/k for each k, so yy — x as k — oo. Since f is 
continuous at x by hypothesis, we also have f(y) + (x) ask — oo. However, 
S (yx) ¢ V for any k, and since V is open, V° is closed, so f(x) = lim; f(y) E VS, 
which contradicts f(x) e V. 

Now, suppose that for each open set V containing f(x), there is an open set U 
containing x such that f(y) € V forall y e UNS. We will show that f is continuous 
at x. Let € > 0 be given. Define Ve to be the open ball in R’ with center f(x) and 
radius €. Then, there exists an open set Ue containing x such that f(y) € Ve for all 
y € U.S. Pick any ô > 0 so that B(x, 5) C Ue. Then, by construction, it is true 
that y € S and d(x, y) < ô implies f(y) € Ve, ie., that d( f(x), f(y)) < €. Since 
€ > 0 was arbitrary, we have shown precisely that f is continuous at x. D 


As an immediate corollary, we have the following statement, which is usually 
abbreviated as: “a function is continuous if and only if the inverse image of every 
open set is open.” 


Corollary 1.50 A function f:S c R” — R! is continuous on S if and only if for 
each open set V C R’, there is an open set U C R" such that f-'(V) = UNS, 
where f~'(V) is defined by 


SOW) =(e ES} fe) €V}. 


In particular, if S is an open set in R”, f is continuous on S if and only if f~'(V) 
is an open set in R” for each open set V in R. 


Finally, some observations. Note that continuity of a function f at a point x is a 
local property, i.e., it relates to the behavior of f near x, but tells us nothing about the 
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F(z) 


1 y f ao liea 
£g fee, fi 


Li F bonding wre inky ao. 
behavior of f elsewhere. In particular, the continuity of f at x has no implications 
even for the continuity of f at points “close” to x. Indeed, it is easy to construct 
functions that are continuous at a given point x, but that are discontinuous at every J 
other point in every neighborhood of x (see the Exercises). It is also important to 
note that, in general, functions need not be continuous at even a single point in. their 

- domain. Consider f: Ry — R+ given by f(x) = 1, if x is a rational number, and 


f(x) = 0, otherwise. This function is discontinuous everywhere on R4. 


Fig. 1.5. The Derivative 


1.4.2 Differentiable and Continuously Differentiable Functions 


Throughout this subsection S will denote an open set in R”. 

A function f: S > R” is said to be differentiable at a point x € S if there exists 
anm x n matrix A such that for all € > 0, there is 6 > 0 such that y € S and 
lx — yl] < ô implies 


It fe) — f(y) — Ax = yll < elx — yll- 
Equivalently, f is differentiable atx € Sif 


CE = 
ly- xl eae 


lim 
Yo 


(The notation “y —> x” is shorthand for “for all sequences {yg} such that yg — x”) 
The matrix A in this case is called the derivative of f at x and is denoted Df(x). 
Figure 1.5 provides a graphical illustration of the derivative. In keeping with standard 
practice, we shall, in the sequel, denote Df(x) by f'(x) whenever n =m = h ie., 
whenever S C Rand f: S > R. 
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Remark The definition of the derivative Df may be motivated as follows. An affine 
function from R” to R” is a function g of the form 


g(y) = Ay +b, 


where A is an m x n matrix, and b € R”. (When b = 0, the function g is called 
linear.) Intuitively, the derivative of f ata point x € Sis the best affine approximation 
to f at x, i.e., the best approximation of f around the point x by an affine function 
g. Here, “best” means that the ratio 


(> =o) 
ly — xil 


goes to zero as y — x. Since the values of f and g must coincide at x (otherwise g 
would hardly be a good approximation to f at x), we must have g(x) = Ax + b = 
S(x), orb = f(x) — Ax. Thus, we may write this approximating function g as 


gly) = Ay— Ax + f(x) = A(y—x) + f). 


Given this value for g(y), the task of identifying the best affine approximation to f 
at x now amounts to identifying a matrix A such that 


(> EDN) = (ee ~ AQ - x) + fOII 


) >0 as por. 
ly — xl ly — +l 


This is precisely the definition of the derivative we have given. o 


If f is differentiable at all points in S, then f is said to be differentiable on S. 
When f is differentiable on S, the derivative Df itself forms a function from S to 
R” *”. If Df: S —> R”*" is a continuous function, then f is said to be continuously 
differentiable on S, and we write f is C!. 

The following observations are immediate from the definitions. A function f: $ C 
R” — R” is differentiable at x € S if and only if each of the m component 
functions f i: S — Rof f is differentiable at x, in which case we have Df(x) = 
(Df'(x),..., Df” (x)). Moreover, f is C! on Sif and only if each f' is C'onS. 

The difference between differentiability and continuous differentiability is non- 
trivial. The following example shows that a function may be differentiable every- 
where, but may still not be continuously differentiable. 


Example 1.51 Let f: R — R be given by 


0 ifx =0 


I@) = x? sin(1/x2) if x #0. 
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f(x) = 2x sin (=) = (=) cos (a i 


Since | sin(-)| < 1 and |cos(-)| <1, but (2/x) > œœ as x — 0, it is clear that the 
limit as x —> 0 of f'(x) is not well defined. However, f’ (0) does exist! Indeed, 


/ _ a T(x) — f(0) g : ] 
Ome (eg) ee) 


For x + 0, we have 


Since | sin(1/x2)| < 1, we have |x sin(1/x2)| < |x], sox sin(1 /x2) > Oasx > 0. 
This means f’(0) = 0. Thus, f is not C! on R4. QO 


icis example notwithstanding, it is true that the derivative of an everywhere dif- 
ferentiable function f must possess a minimal amount of continuity. See the Inter- 
mediate Vaiue Theorem for the Derivative in subsection 1.6.1 for details. 

We close this subsection with a statement of two important propertics of the deriva- 
tive. First, given two functions f:R” > R” and g: R” — R”, define their sin 
(f + g) to be the function from R” to R” whose value at any x € R” is f(x) + g(x). 


Theorem 1.52 If f:R" > R” and g: R" > R" are both differentiable at a point 
x € R”, so is (f + g) and, in fact, 


Df + g)(x) = Df(x) + Dg(x). 
Proof Obvious from the definition of differentiability. 0 


Next, given functions f:R” > R” and h: Rt — R”, define their composition 
f oh to be the function from R* to R” whose value at any x € RÅ is given by 
f(A(x)), that is, by the value of f evaluated at A(x). 


Theorem 1.53 Let f:R” — R” andh:R* — R". Let x € RÝ. Ifh is differentiable 
atx, and f is differentiable at h(x), then f oh is itself differentiable at x, and its 
derivative may be obtained throught the “chain rule” as: 


D(f ohx) = DFW) DAC). 
Proof See Rudin (1976, Theorem 9.15, p.214). a) 


Theorems 1.52 and 1.53 are only one-way implications, For instance, while the 
differentiability of f and g at x implies the differentiability of (f+ g) atx. (f +g) 
can be differentiable everywhere (even C!) without f and g being differentiable 
anywhere. For an example, let f: R > R be given by f(x) = 1 if x is rational, and 
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J (x) = 0 otherwise, and let g: R — R be given by g(x) = 0 if x is rational, and 
g(x) = 1 otherwise. Then, f and g are discontinuous everywhere, so are certainly 
not differentiable anywhere. However, (f + g)(x) = 1 forall x, so (f + g)'(x) = 0 
at all x, meaning (f + g) is C'. Similarly, the differentiability of f o h has no 
implications for the differentiability of f at h(x) or the differentiability of A at x. 


1.4.3 Partial Derivatives and Differentiability 
Let f:S — R, where S C R” is an open set. Let ej denote the vector in R” that 
has a | in the j-th place and zeros elsewhere (j = I,...,n). Then the j-th partial 
derivative of f is said to exist at a point x if there is a number 0/(x)/0x,; such that 
tej) — a 
lim pS ae 
t ax j 


Among the more pleasant facts of life are the following: 


(x). 


t0 


Theorem 1.54 Let f: S — R, where S C R” is open. 


1. If f is differentiable at x, then all partials 3f (x)/3x; exist at x, and Df (x) = 
[3S &)/3x1, ..., Of (x)/3xXn]. 

2. Ifall the partials exist and are continuous at x, then Df (x) exists, and Df (x) = 
[Af (x)/dx1,..., Of (x)/3xn]. 

3. fis C! on S if and only if all partial derivatives of f exist and are continuous 
on S. 


Proof See Rudin (1976, Theorem 9.21, p.219). oO 


Thus, to check if f is C!, we only need figure out if (a) the partial derivatives all 
exist on S, and (b) if they are all continuous on S. On the other hand, the require- 
ment that the partials not only exist but be continuous at x is very important for the 
coincidence of the vector of partials with D/(x). In the absence of this condition, 
all partials could exist at some point without the function itself being differentiable 
at that point. Consider the following example: 


Example 1.55 Let f: R? —> R be given by f (0, 0) = 0, and for (x, y) Æ (0, 0), 


x 
Sx, y) i PEELA 
yx? + y2 
We will show that f has all partial derivatives everywhere (including at (0,0)), but 


that these partials are not continuous at (0,0). Then we will show that f is not 
differentiable at (0,0). 
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Since f(x, 0) = 0 for any x-# 0, itis immediate that for all x # 0, 
L x, 0) = nl- _ x 
um y y>0 /y2 $ p2 


Similarly, at all points of the form (0, y) for y # 0, we have ðf (0, y)/dx = 1. 
However, note that 


Loo, 0) = lim FOO) STOO 24 ia OO as 
x x-0 x 
sod f (0, 0)/əx exists at (0, 0), but is not the limit of 0f(0, y)/dx as y > 0. Similarly, 
we also have (0, 0)/dy = 0 4 1 = limy_40 Of (x, 0)/dy. 
Suppose f ' ‘ere differentiable at (0, 0). Then, the derivative D/(0, 0) must coin- 
cide with the vector of partials at (0,0) so we must have Df(0, 0) = (0, 0). However, 
from the definition of the derivative, we must also have 


m Ley) = 0.0) = PFO.) _ 
(x,y) (0,0) Cx, y) — (0, O) II 


but this is impossible if Df (0, 0) = 0. To see this, take any point (x, y) of the form 
(a, a) for some a > 0, and note that every neighborhood of (0,0) contains at least 


one such point. Since (0,0) = 0, D/(0, 0) = (0, 0), and I(x, yl = yi? + ait 
follows that 


Il f(a, a) — f(0,0) — Df(O, 0) - (a, a) II a? l 


(a, a) to, (0, 0)| 2a? ~ 2° 


so the limit of this fraction as a —> O cannot be zero. C) 


Intuitively, the feature that drives this example is that in looking at the partial 
derivative of f with respect to (say) x at a point (x, y), we are moving along only 
the line through (x, y) parallel to the x-axis (see the line denoted /, in Figure 1.6). 
Similarly, the partial with derivative with respect to y involves holding the x variable 
fixed, and moving only on the line through (x, y) parallel to the y-axis (see the line 
denoted /2 in Figure 1.6). On the other hand, in looking at the derivative Df, both the 
x and y variables are allowed to vary simultaneously (for instance, along the dotted 
curve in Figure 1.6). 

Lastly, it is worth stressing that although a function must be continuous in order 
to be differentiable (this is easy to see from the definitions), there is no implication 
in the other direction whatsoever. Extreme examples exist of functions which are 
continuous on all of R, but fail to be differentiable at even a single point (see, 
for example, Rudin, 1976, Theorem 7.18, p.154). Such functions are by no means 
pathological; they play, for instance, a central role in the study of Brownian motion 
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Fig. 1.6. Partial Derivatives and Differentiability 


in probability theory (with probability one, a Brownian motion path is everywhere 
continuous and nowhere differentiable). 


1.4.4 Directional Derivatives and Differentiability 
Let f: S > R, where S c R” is open. Let x be any point in S, and let h € R”. The 
directional derivative of f at x in the direction h is defined as 


(Meee) 


t 


lim 
t30+ 


when this limit exists, and is denoted Df (x; h). (The notation t —> 0+ is shorthand 
fort > 0,t > 0.) 

When the condition ¢ — 0+ is replaced with t —> 0, we obtain what is sometimes 
called the “two-sided directional derivative.” Observe that partial derivatives are a 
special case of two-sided directional derivatives: when h = e; for some i, the two- 
sided directional derivative at x is precisely the partial derivative 3f (x)/3xi. 

In the previous subsection, it was pointed out that the existence of all partial 
derivatives at a point x is not sufficient to ensure that f is differentiable at x. It is 
actually true that not even the existence of all two-sided directional derivatives at x 
implies that f is differentiable at x (see the Exercises for an example). However, the 
following relationship in the reverse direction is easy to show and is left to the reader 
as an exercise. 
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Theorem 1.56 Suppose f is differentiable at x € S. Then, for any h € R’,_the_ 
(one-sided) directional derivative Df(x;h) of f at x in the direction h exists, and, 
in fact, we have Df(x;h) = Df(x)-h. 


An immediate corollary is: 
Corollary 1.57 If Df (x) exists, then Df(x;h) = — Df (x; —h). 


Remark What is the relationship between Df(x) and the two-sided directional! 
derivative of f at x in an arbitrary direction h? 


1.4.5 Higher Order Derivatives 


Let f be a function from S C R” to R, where S is an open set. Throughout this 
subsection, we will assume that f is differentiable on all of S, so that the derivative 
Df = [8f/8x;,..., 0f/0xn) itself defines a function from S to R”. 

Suppose now that there is x € S such that the derivative Df is itself differentiable 
at x, i.e., such that for each i, the function df/dx;: S — R is differentiable at x. 
Denote ť e partial of 0f/0x; in the direction e; at x by a2 f(x) /dxj9x;, ifi Æ j, and 
a f(x)/ox?, if i = j. Then, we say that f is twice-differentiable at x, with second 
derivative D? f(x), where 


fe) Pf) 
ax? OX 1 OXp 
D? f(x) = l 
PFO PSE) 
Oxn OX] l ax2 


Once again, we shall follow standard practice and denote D? f(x) by f” (x) whenever 
n = l (ie, if SC R). 

If f is twice-differentiable at each x in S, we say that f is twice-differentiable on 
S. When f is twice-differentiable on S, and foreach i, j = 1,..., n, thecross-partial 
a? J/0x;x; is a continuous function from Sto R, we say that f is nvice continuously 
differentiable on S, and we write f is C?. 

When f is C?, the second-derivative D? f, which is also called the matrix of 
cross-partials (or the hessian of f at x), has the following useful property: 


Theorem 1.58 If f: D — R” is aC? function, D? f is a symmetric matrix, i.e., we 
have 

af ay 
xy 
Ox; Ox; OX jOX; 


(x) 


foralli, j =1,...,n, and forall x e D. 
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Proof See Rudin (1976, Corollary to Theorem 9.41, p.236). o 


For an example where the symmetry of D? f fails because the cross-partials fail 
to be continuous, see the Exercises. 

The condition that the partials should be continuous for D? f to be a symmetric 
matrix can be weakened a little. In particular, for 


a f af 
5 (y) = 
xjðxk ðxkðXj 


(y) 


to hold, it suffices just that (a) the partials 3f/3x; and 3 f/3x; exist everywhere on D, 
and (b) that one of the cross-partials 3? f/ Əðxjðxk OF a? f, /Ox,ax; exist every where 
on D and be continuous at y. 

Still higher derivatives (third, fourth, etc.) may be defined for a function f: R” —> 
R. The underlying idea is simple: for instance, a function is thrice-differentiable at 
a point x if all the component functions of its second-derivative D? f (i.e., if all the 
cross-partial functions a2 f/0xj;9x;) are themselves differentiable at x; it is C? if all 
these component functions are continuously differentiable, etc. On the other hand, 
the notation becomes quite complex unless n = 1 (i.e., f:R > RÌ, and we do not 
have any use in this book for derivatives beyond the second, so we will not attempt 
formal definitions here. 


1.5 Quadratic Forms: Definite and Semidefinite Matrices 
1.5.1 Quadratic Forms and Definiteness 
A quadratic form on R” is a function g4 on R” of the form 


n 
gax) = x Ax = >: Qj jXiX; 
i, j=l 
where A = (aij) is any symmetric n x n matrix. Since the quadratic form g4 is 
completely specified by the matrix 4, we henceforth refer to A itself as the quadratic 
form. Our interest in quadratic forms arises from the fact that if f is a C? function, 
and zis a point in the domain of f, then the matrix of second partials D? f(z) defines 
a quadratic form (this follows from Theorem 1.58 on the symmetry property of D? f 
for a C? function f). 
A quadratic form 4 is said to be 


1. positive definite if we have x'Ax > 0 for all x e R", x 40. 
2. positive semidefinite if we have x’ Ax > 0 forall x € R", x 40. 
3. negative definite if we have x'Ax < O forallx e R”, x 40. 
4. negative semidefinite if we have x’Ax < 0 forall x € R”, x £0. 
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The terms “nonnegative definite” and “nonpositive definite” are often used in place 
of “positive semidefinite” and “negative semidefinite” respectively. 
For instance, the quadratic form A defined by 


lof] 


is positive definite, since for any x = (x1, x2) € R?, we have x/Ax = x? + x3, and 
this quantity is positive whenever x # 0. On the other hand, consider the quadratic 


form 
A= | : A z 
0 0 

For any x = (x1, x2) € R?, we have x’ Ax = xe so x’ Ax can be zero even if x Æ 0. 
(For example, x’ Ax = 0if x = (0, 1).) Thus, A is not positive definite. On the other 
hand, it is certainly true that we always have x’Ax > 0, so A is positive semidefinite. 

Observe that there exist matrices A which are neither positive semidefinite nor 
negative semidefinite, and that do not, therefore, fitinto any of the four categories we 
have identified. Such matrices are called indefinite quadratic forms. As an example 
of an ir Jefinite quadratic form A, consider 


a=; 1 


Forx = (1, 1),x'4x = 2 > 0, so A is not negative semidefinite. But for x = (~1, 1), 
x'Ax = —2 < 0, so A is not positive semidefinite either. 

Given a quadratic form A and any ¢ € R, we have (tx) Alt x) = t?x' Ax, so the 
quadratic form has the same sign along lines through the origin. Thus, in particular, 
A is positive definite (resp. negative definite) if and only if it satisfies x'4x > 0 
(resp. x’Ax < 0) for all x in the unit sphere C = {u € R” | llull = 1}. We will use 
this observation to show that if A is a positive definite (or negative definite) n x n 
matrix, so is any other quadratic form B which is sufficiently close to A: 


Theorem 1.59 Let A be a positive definite n x n matrix. Then, there is y > 0 
such that if B is any symmetric n x n matrix with |bjx — ajk| < y forall j,k € 
{l,..., 7n}, then B is also positive definite. A similar statement holds for negative 
definite matrices A. 


Proof We will make use of the Weierstrass Theorem, which is proved in Chapter 3 
(see Theorem 3.1). The Weierstrass Theorem states that if K C R” is compact, and 
J:K — R is a continuous function, then f has both a maximum and a minimum 
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on K, i.e., there exist points k’ and k* in K such that f(k’) > f(k) > f(k*) forall 
kek. 

Now, the unit sphere C is clearly compact, and the quadratic form A is continuous 
on this set. Therefore, by the Weierstrass Theorem, there is z in C such that for any 
x € C, we have 

z'Az < x Ax. 


If A is positive definite, then z’ Az must be strictly positive, so there must exist € > 0 
such that x’Ax > € > Oforallx EC. 

Define y = €/2n? > 0. Let B be any symmetric n x n matrix, which is such that 
lbjk — ajx| < y forall j,k = 1,...,”. Then, for any x € C, 


n 
Ix’(B — A)x| =| D> (bjk — aja) xjxel 
j k=l 


n 
< J. [bye — ayalix;lixel 
jJk=i 


n 


<y >> lxjllxk 


jk=1 
Zoazi 
< yn? = €/2. 
Therefore, for any x € C, 
x'Bx = x'Ax+x'(B—A)x > €-€/2 = €/2 
so B is also positive definite, and the desired result is established. a 


A particular implication of this result, which we will use in Chapter 4 in the study 
of unconstrained optimization problems, is the following: 


Corollary 1.60 If f is aC? function such that at some point x, D? f (x) is a positive 
definite matrix, then there isa neighborhood B(x, r) ofx suchthatforall y € B(x,r), 
D? f(y) is also a positive definite matrix. A similar statement holds if D? f(x) is, 
instead, a negative definite matrix. 


Finally, it is important to point out that Theorem 1.59 is no longer true if “positive 
definite” is replaced by “positive semidefinite.” Consider, as a counterexample, the 
matrix A defined by 
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We have seen above that A is positive, semidefinite (but not positive definite). Rick _ 
any y > 0. Then, for € = y/2, the matrix 


PS 


satisfies |a;; — bij| < y for all i, j. However, B is not positive semidefinite: for 
x = (x1, x2), we have x'Bx = x? — ex3, and this quantity can be negative (for 
instance, if x; = 0 and x2 # 0). Thus, there is no neighborhood of 4 such that all 
quadratic forms in that neighborhood are also positive semidefinite. 


1.5.2 Identifying Definiteness and Semidefiniteness 


From a practical standpoint, it is of interest to ask: what restrictions on the structure of 
A are imposed by the requirement that 4 be a positive (or negative) definite quadratic 
form? We provide answers to this question in this section. The results we present 
are, in fact, equivalence statements, that is, quadratic forms possess the required 
definiteness or semidefiniteness property if and only if they meet the conditions we 
outline. 

The first result deals with positive and negative definiteness. Given an n x n 
symmetric matrix A, let A, denote the k x k submatrix of A that is obtained when 
only the fi..tk rows and columns are retained, i.e., let 


aye. Qk 


aki +e kk 


We will refer to Ax as the k-th naturally ordered principal minor of A. 


Theorem 1.61 Ann x n symmetric matrix A is 
1. negative definite if and only if (—1)*|Ax| > Oforallk € {1,..., n}. 
2. positive definite if and only if |Ax| > Oforallk € {1,..., n}. 


Moreover, a positive semidefinite quadratic form A is positive definite if and only if 
|A| Æ 0, while a negative semidefinite quadratic form is negative definite ifand only 


if |A| #0. 
Proof See Debreu (1952, Theorem 2, p.296). Oo 


A natural conjecture is that this theorem would continue to hold if the words “neg- 
ative definite” and “positive definite” were replaced with “negative semidefinite” and 
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“positive semidefinite,” respectively, provided the strict inequalities were replaced 
with weak ones. This conjecture is false. Consider the following example: 


0 O 0 0 
a=|f l and Pa, ah 


Then, A and B are both symmetric matrices. Moreover, jAi] = |A2| = {B)| = 
|B2| = 0, so if the conjecture were true, both A and B would pass the test for positive 
semidefiniteness, as well as the test for negative semidefiniteness. However, for any 


Example 1.62 Let 


x €Rx/Ax= xe and x’Bx = —xi. Therefore, A is positive semidefinite but not 
negative semidefinite, while B is negative semidefinite, but not positive semidefinite. 
m) 


Roughly speaking, the feature driving this counterexample is that, in both the 
matrices A and B, the zero entries in all but the (2,2)-place of the matrix make the 
determinants of order | and 2 both zero. In particular, no play is given to the sign of 
the entry in the (2,2)-place, which is positive in one case, and negative in the other. 
On the other hand, an examination of the expressions x’ Ax and x’ Bx reveals that in 
both cases, the sign of the quadratic form is determined precisely by the sign of the 
(2,2)-entry. 

This problem points to the need to expand the set of submatrices that we consider, if 
we are to obtain an analog of Theorem 1.61 for positive and negative semidefiniteness. 
Let an» x n symmetric matrix A be given, and let 7 = (711, ..., Xn) be a permutation 
of the integers {1,...,}. Denote by A™ the symmetric n x n matrix obtained by 
applying the permutation z to both the rows and columns of A: 


Anya, >+- Arty 
A= 
Anam, +++ Artaty 
Fork € {1,..., a), let Ak denote the k x k symmetric submatrix of A” obtained by 


retaining only the first k rows and columns: 


Anya, -> Amyn 


angry eee Any ry 


Finally, let N denote the set of all possible permutations of {1,..., n}. 
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i aeorem 1.63 A symmetric n xn matrix A is 


1. positive semidefinite if and only if |A{| > O for allk € {1,...,n) and for all 
xen. 

2. negative semidefinite if and only if (— 1) AT| > Oforallk e {l,..., n} and for 
allx e N. 


Proof See Debreu (1952, Theorem 7, p.298). g 


One final remark is important. The symmetry assumption is crucial to the validity 
of these results. If it fails, a matrix A might pass all the tests for (say) positive 
semidefiniteness without actually being positive semidefinite. Here are two examples: 


TE, 


Note that |4;| = 1, and |42| = (1)(1) — (—3)(0) = 1, so A passes the test for 
P 

positive definiteness. However, A is not a symmetric matrix, and is not, in fact, 

positive definite: we have x'Ax = x? +x} — 3x,x2 which is negative for x = (1, 1). 


Q 
0 1 
afe 


There are only two possible permutations of the set {1, 2}, namely, {1, 2} itself, and 
{2, 1}. This gives rise to four different submatrices, whose determinants we have to 


consider: 
ait aj2 a22 a? 
[an], [a22], , and l ali 
a2) 422 a2 ajl 


It is an easy matter to check that the determinants of all four of these are nonneg- 
ative, so A passes the test for positive semidefiniteness. However, A is not positive 
semidefinite: we have x’Ax = x 1x2, which could be positive or negative. o 


Example 1.64 Let 


Example 1.65 Let 


1.6 Some Important Results 


This section brings together some results of importance for the study of optimization 
theory. Subsection 1.6.1 discusses a class of results called “separation theorems” for 
convex sets in R”. Subsection 1.6.2 summarizes some important consequences of 
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assuming the continuity and/or differentiability of real-valued functions defined on 
IR". Finally, Subsection 1.6.3 outlines two fundamental results known as the Inverse 
Function Theorem and the Implicit Function Theorem. 


1.6.1 Separation Theorems 
Let p # 0 be a vector in R”, and let a € R. The set H defined by 


H = {xe R"| p-x =a} 


is called a hyperplane in R”, and will be denoted H(p, a). 

A hyperplane in R?, for example, is simply a straight line: if p € R? anda € R, the 
hyperplane H(p, a) is simply the set of points (x4, x2) that satisfy pix; + pox2 = a. 
Similarly, a hyperplane in R? is a plane. 

A set D in R” is said to be bounded by a hyperplane H(p, a) if D lies entirely on 
one side of H(p, a), i.e., if either 


p-x <a, forallxeD, 


or 


p:-x >a, forallxeD. 


If D is bounded by H(p,a) and DN H(p,a) + Ø, then H(p, a) is said to be a 
supporting hyperplane for D. 


Example 1.66 Let D = {(x, y) € Ri | xy > 1}. Let p be the vector (1, 1), and let 
a = 2. Then, the hyperplane 


H(p,a) = {(x, y) €R? | x+y =2) 


bounds D: if xy > 1 and x, y > 0, then we must have (x + y) > (x + x7!) >2.In 
fact, H(p, a) is a supporting hyperplane for D since H(p, a) and D have the point 
(x, y) = (1, 1) in common. oO 


Two sets D and € in R” are said to be separated by the hyperplane H(p, a) in R” 
if D and £ lie on opposite sides of H(p, a), i.e., if we have 
p-y < a, forallyeD 
p:z 2 a, forallze€é. 


If D and E are separated by H (p, a) and one of the sets (say, E) consists of just a 
single point x, we will indulge in a slight abuse of terminology and say that H(p, a) 
separates the set D and the point x. 
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A final definition is required before we state the main results of this section. Given 
a set X C R”, the closure of X, denoted X°, is defined to be the intersection of all 
closed sets containing ¥, i.e., if 


A(X) = {Y CR" Æc yV), 
then 
X? = Nyeacryd- 


Intuitively, the closure of ¥ is the “smallest” closed set that contains X. Since the 
arbitrary intersection of closed sets in closed, X° is closed for any set V°. Note that 
X° = X if and only if X is itself closed. 

The following results deal with the separation of convex sets by hyperplanes. They 
play a significant role in the study of inequality-constrained optimization problems 
under convexity restrictions (see Chapter 7). 


Theorem 1.67 Let D be a nonempty convex set in IR", and let x* be a point in 
R” that is not in D. Then, there is a hyperplane H(p,a) in R” with p 4 O which 
separates D and x*. We may, if we desire, choose p to also satisfy || p|| = 1. 


Proof We first prove the result for the case where D is a closed set. Then, using 
this result, we establish the general case. The proof will make use of the Weierstrass 
Theorem (see Theorem 3.1 in Chapter 3), which states that if KC R” is compact, 
and f: K — Ris a continuous function, then f has both a maximum and a minimum 
on KX, i.e., there exist points k’ and k* in K such that f(k’) > f(k) > f(k*) forall 
KEK. 

So supy ‘se D is a nonempty, closed, convex set in R”, and x” ¢ D. We claim that 
there exists y* € D.such that ~~ 


d(x*, y*) < d(x", y), yeD. 


Indeed, this is an easy OSES of the Weierstrass Theorem. Let B(x*, r) denote 
the closed ball with center x* and radius r. Pick r 5 0 sufficiently large so that the 
set Y = B(x*,r) N D is nonempty. Since B(x*,r) and D are both closed sets, so 
is Y. Since B(x*, r) is bounded, and Y C B(x*, r), Y is also bounded. Therefore. 
Y is compact. The function f: Y — R defined by f(y) = d(x*, y) is clearly 
continuous on Y, since the Euclidean metric d is continuous. Thus, by the Weierstrass 
Theorem, there exists a minimum y* of f on J, i.e., there exists y* in Y such 
that 


d(x*, y*) < d(x*, y), yey. 
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Ify € Dand y ¢ yY, then we must have y ¢ B(x*, r), so-d(x*, y) > r. Therefore, 
we have established, as required, that 
d(x*,y*) < d(x*, y), yeD. 

Now let p = y* —x* and leta = p- y*. We will show that the hyperplane H(p, a) 

separates D and x*. To this end, note first that 
p-x* = (y*~x*) +x" 

-()" —x"*) -(y” ~x"*) +y 7 0” ~x*) 
—IIpll? +a 


<a. 


i 


To complete the proof of the theorem for the case where D is closed, we will now 
show that p+ y > a for all y e D. Pick any y e D. Since y* e D and D is convex, 
itis the case that for all A € (0, 1), the point y(A) = Ay + (1 — A) y* is also in D. By 
definition of y*, we then have 


. d(x", y*) < d(x*, yQ)). 
By expanding the terms in this inequality, we obtain 
xt = yi? < Ix -= yo? 
= llx* — ày = (1 = A)y* |? 
= e*t U- = yN? 
= Îl" — yl? +221 -At = y) = y") 
+A = AP- y. 
Rearranging terms, this gives us 
0 < alat — yll? 2A — AG = y) @* = y*) AR = i" = YP. 
Dividing both sides by à > 0, we get 
O < Alle — yll? +20 — Na" = y) @* ~ y*) ~ 2—Allx* = yN’. 
Taking limits as à — 0, and dividing the result by 2, results in the inequality 
0 < (x = yh ("= y*) = Be — yA? 
= (x* — y") (x -y—x* + y") 
= (x* — y*)-(y"— y). 
Rearranging terms once again, and using the definitional relations p = (y* — x") 


and p- y* =a, we finally obtain 


poy = OF ~-x")-y > Gt -x")- y = a. 
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~ e a 


Thus, we have shown that there is a hyperplane H(p, a) that separates D and x” 
when D is closed. It remains to be shown that we may, without loss of generality, 
take || p|| to be unity. Suppose || pi] 4 1. We will show the existence of (p, a) such 
that H(p, a) also separates D and x*, and which further satisfies {| pl] = 1. Since 
p Æ 0, we have ||pl| > 0. Define p = p/|\p\| and a = a/||pl|. Then, we have 
| Pll = 1 and 


eng p:x* a A 
Pp “x =o CK UC FO hUmcuvE 
ilipli ipli 
while for y € D, we also have 
ipil ipi 


Thus, H(p, a) also separates D and x*, as required, and we have shown that the 
vector p in the separating hyperplane may be taken to have unit norm without loss of 
generality. This completes the proof of the theorem for the case where D is closed. 

Now, suppose D is not closed. Let D? denote the closure of D. As the closure of a 
convex set, D® is also convex. If x* ¢ D°, then by the arguments we have just given, 
there exists a hyperplane that separates the closed set D? and x*. Since D C D°, 
this implies D and x* can be separated. 

Finally, suppose x* ¢ D, but x* € D°. Then, for any r > 0, there must exista €~ 4 
point x(r) € B(x, r) such that x(r) ¢ D°. Therefore, there exists pír) € R” with bg 
| p(r)|| = 1 such that w? 


P(r)-# < plr)-y, yen. 


Pick a sequence {rg} with r > O and rg —> O. For notational ease, let py = p(rg) 
and xz = x(r(k)). Since || px || = 1 for all k, each px lies in the compact set 


C = (ze R" ||izļ = 1}. 


Therrro.., there is a subsequence m (k) of k, and a point p € C such that pim(k)) —> 
p. Since we have 


Pm(k) *Xm(k) S Pm(ky Ys = ve D®, 
and the inner product is continuous, by taking limits as k — oo, we obtain 
px" < p-y, yeD®. 
Since D c D”, this implies 
Pex = pry, PED. 


If we define a = p-x*, we have shown that the hyperplane H(p, a) separates D 
and x”. J 
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Theorem 1.68 Let D and E be convex sets in R" such that DN E = Ø. Then, there 
exists a hyperplane H(p, a) in R” which separates D and E. We may, if we desire, 
choose p to also satisfy || p|| = 1. 


Proof Let F = D + (—E€), where, in obvious notation, —E is the set 
{ye R"] — ye €&}. 


By Theorem 1.38, the convexity of D and £ implies that F is also convex. We claim 
thatO ¢ F. If wehad0O € F, then there would exist points x € D and y € E such that 
x — y = 0. But this implies x = y, sox € DN E, which contradicts the assumption 
that DN E is empty. Therefore, 0 ¢ F. 

By Theorem 1.67, there exists p € IR” such that 


p-0 < p-z, ZEF. 
This is the same thing as 


PpP: y <p-x, xeD ye€. 


It follows that Supyce P: y < infxep p: x. Ifa e [supyee P - y, infxep p- x), the 
hyperplane H(p, a) separates D and E. 

That p can also be chosen to satisfy || pll = l is established in the same way as in 
Theorem 1.67, and is left to the reader as an exercise. QO 


1.6.2 The Intermediate and Mean Value Theorems 


The Intermediate Value Theorem asserts that a continuous real function on an interval 
assumes all intermediate values on the interval. Figure 1.7 illustrates the result. 


Theorem 1.69 (Intermediate Value Theorem) Ler D = [a, b} be an interval in 
R and let f: D —> R be a continuous function. If f(a) < f(b), and if c is a real 
number such that f(a) < c < f(b), then there exists x € (a, b) such that f(x) =e. 
A similar statement holds if f(a) > f(b). 


Proof See Rudin (1976, Theorem 4.23, p.93). Oo 


Remark It might appear at first glance that the intermediate value property actually 
characterizes continuous functions, i.e., that a function f: (a, b] > R is continuous 
if and only if for any two points x} < x2 and for any real number c lying between 
f(x}) and f(x2), there is x € (x1, x2) such that f(x) = c. The Intermediate Value 
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Fig. 1.7. The Intermediate Value Theorem 


Theorem shows that the “only if” part is true. It is left to the reader to show that the 
converse, namely the “if” part, is actually false. (Hint: Use Example 1.51.) a 


We have seen in Example 1.51 that a function may be differentiable everywhere, 
but may fail to be continuously differentiable. The following result (which may be 
regarded as an Intermediate Value Theorem for the derivative) states, however, that the 
derivative must still have some minimal continuity properties, viz., that the derivative 
must assume all intermediate values. In particular, it shows that the derivative f” of 
an everywhere differentiable function f cannot have jump discontinuities. 


Theorem 1.70 (Intermediate Value Theorem for the Derivative) Let D = {a, bj 
be an intervalin R, and let f: D —> R bea function that is differentiable everywhere 
on D. If f'(a) < f'(b), and if c is a real number such that f'(a) < c < f'(b), 
then there is a point x € (a,b) such that f'(x) = c. A similar statement holds if 


f'a) > f'(b). 
Proof See Rudin (1976, Theorem 5.12, p.108). g 


It is very important to emphasize that Theorem 1.70 does not assume that f is 
a C! function. Indeed, if f were C!, the result would be a trivial consequence of 
the Intermediate Value Theorem, since the derivative f” would then be a continuous 
function on D. 

The next result, the Mean Value Theorem, provides another property that the 
derivative must satisfy. A graphical representation of this result is provided in Fig- 
ure 1.8. As with Theorem 1.70, it is assumed only that f is everywhere differentiable 
on its domain D, and not that it is C!. 
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Fig. 1.8. The Mean Value Theorem 


Theorem 1.71 (Mean Value Theorem) Let D = [a, b] be an interval in R, and 
let f: D — R be a continuous function. Suppose f is differentiable on (a, b). Then, 
there exists x € (a, b) such that 


f(b) — fla) = (b-a) f'(x). 
Proof See Rudin (1976, Theorem 5.10, p.108). a 


The following generalization of the Mean Value Theorem is known as Taylor’s 
Theorem. It may be regarded as showing that a many-times differentiable function 
can be approximated by a polynomial. The notation f (© (z) is used in the statement 
of Taylor’s Theorem to denote the k-th derivative of f evaluated at the point z. When 
k = 0, f(x) should be interpreted simply as f(x). 


Theorem 1.72 (Taylor’s Theorem) Let f: D — R be a C™ function, where D 
is an open interval in R, and m > 0 is a nonnegative integer. Suppose also that 
f™*(z) exists for every point z € D. Then, for any x, y € D, there is z € (x, y) 
such that 


Jo = 


mf f(xy - xt J TAY = x)"+! 
2 ( k! m+) ` 


k=0 


Proof See Rudin (1976, Theorem 5.15, p.110). o 
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Each of the results stated in this subsection, with the obvious exception of the in- 
termediate Value Theorem for the Derivative, also has an n-dimensional version. We 
state these versions here, deriving their proofs as consequences of the corresponding 
result in R. 


Theorem 1.73 (The Intermediate Value Theorem in R”) Let D C R” bea convex 
set, and let f: D —> R be continuous on D. Suppose that a and b are points in D such 
that f(a) < f(b). Then, for any c such that f(a) < c < f(b), there is Â € (0,1) 
such that f((1 — Na+ hb) =c. 


Proof We derive this result as a consequence of the Intermediate Value Theorem in 
R. Let g:[0, 1] > R be defined by g(A) = f((1 — Aja + Ab), à € [0, 1]. Since f 
is a continuous function, g is evidently continuous on [0,1]. Moreover, g(0) = f(a) 
and g(1) = SO), so g(0)<c < 81). By the Intermediate Value Theorem in R, 
there exists A € (0, 1) such that gÀ) = = c. Since gÂ) = f(a - Ra + hb), we are 
done. o 


An n-dimensional version of the Mean Value Theorem is similarly established: 


` Theorem 1.74 (The Mean Value Theorem in R”) Let D C R” be open and convex, 
and let f: S —> R be a function that is differentiable everywhere on D. Then, for any 
a,b € D, there is à € (0, 1) such that 


f(b) — f(a) = Df — À)a + Âb) - (b — a). 


Proof For notational ease, let z(A) = (1 — Aja + Ab. Define g:[0, 1] —> R by 
g(a) = f(z(A)) for à e [0, 1]. Note that g(0) = f(a) and g(1) = f(b). Since f 
is everywhere differentiable by hypothesis, it follows that g is differentiable at all 
à € [0, 1], and in fact, g’(A) = Df(z(A)) - (b — a). By the Mean Value Theorem for 
functions of one variable, therefore, there is A’ € (0, 1) such that 


g(1) — g(0) = g'(Q’)(1 - 0) = g'a’). 
Substituting for g in terms of f, this is precisely the statement that 
f(b) - f(a) = Df): (b ~ a). 
The theorem is proved. o 
Finally, we turn to Taylor’s Theorem in R”. A complete statement of this result 
requires Some new notation, and is also irrelevant for the remainder of this book. 


We confine ourselves, therefore, to stating two special cases that are useful for our 
purposes. 
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Theorem 1.75 (Taylor’s Theorem in R") Let f: D > R, where D is an open set 
in R". If f is C' on D, then it is the case that for any x, y € D, we have 


SO = f(x) + Df@®)(y — x) + Ri, y), 


where the remainder term R(x, y) has the property that 


lim (222) =0 
y>x \ jx — yil 


If f is C?, this statement can be strengthened to 


1 
fy) = £0) + DSW =x) + 5 = D fy = x) + RoW), 


where the remainder term R(x, y) has the property that 
R ? 
y>x \ lx — yli 
Proof Fix any x € D, and define the function F(-) on D by 


Fly) = f(x) + Df): O- x). 


Leth(y) = f(y)— F(y). Since f and F are C', soish. Note thath(x) = Dh(x) = 0. 
The first-part of the theorem will be proved if we show that 
h(y) 
ly- xl 
or, equivalently, if we show that for any € > 0, there is 5 > O such that 


=> 0as y> x, 


ly— xl < ô implies |k(y)}| < elx — yll. 
So let € > 0 be given. By the continuity of A and Dh, there is 5 > O such that 
ly — x| < ô implies |A(y)| < € and || Dh(y)|| < €. 
Fix any y satisfying |y — x| < 6. Define a function g on {0,1] by 
g(t) = AL — x + ty)? 


Then, g(0) = h(x) = 0. Morcover, gis C! with g(r) = DA[(1 —t)x +ty(y— x). 
Now note that |(1 — t)x + ty — x| = t\(y — x)| < ô for all £ € [0,1], since 

|x — yl < ô. Therefore, || DA[(1 — t)x + ty]l| < € forall £ € [0, 1], and it follows 

that |g’(t)| < €lly — xi for all z € [0, 1). 

5 We are implicitly assuming here that g(-) is well-defined, i.e., that (1 —1)x +ty € D for allt € [0, 1]. This 
assumption is without loss of generality. To see this, note that since D is open and x € D, there isr > 0 
such that B(x,r) C D. By shrinking ô if need be, we can ensure that ô < r. Then it is evidently the case 


that for all y satisfying || y — xil < ô, the line segment joining y and x (i.e., the set of points (1 — t)x +ty 
fort € (0, H) is completely contained in D. 
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By Taylor’s Theorem in R, thére‘is /* € (0, 1) such that 
g) = gO)+g'(t")(1-0) = g(t"). 
Therefore, 
HON = 0] = IgG") < ely — xl. 


Since y was an arbitrary point satisfying |y — x| < ô, the first part of the theorem is 
proved. 

The second part may be established analogously. The details are left as an exercise. 

O 


1.6.3 The Inverse and Implicit Function Theorems 


We now state two results of much importance especially for “comparative statics” 
exercises. The second of these results (the Implicit Function Theorem) also plays a 
central role in proving Lagrange’s Theorem on the first-order conditions for equality- 
constrained optimization problems (see Chapter 5 below). Some new terminology 
is, unfortunately, required first. 

Given a function f: A —> B, we say that the function f maps 4 onto B, if for 
any b € B, there is some a € A such that f(a) = b. We say that f is a one-to- 
one function if for any b € B, there is at most one a € A such that f(a) = b. W 
f: A — B is both one-to-one and onto, then it is easy to see that there is a (unique) 
function g: B — A such that f(g(b)) = b for all b €e B. (Note that we also have 
g(f(a)) =a for alla € A.) The function g is called the inverse function of f. 


Theorem 1.76 (Inverse Function Theorem) Let f:S —> R” be a C! function, 
where S C R” is open. Suppose there is a point y € S such that the n x n matrix 
Df (y) is invertible. Let x = f(y). Then: 


1. There are open sets U and V in R” such thatx € U, ye V, f is one-to-one on 
Viand “l, )=U. 

2. The inverse function g: U —> V of f isaC! function on U, whose derivative at 
any point È € U satisfies 


Dg) =(DA)}. where f) = x. 
Proof See Rudin (1976, Theorem 9.24, p.221). g 


Turning now to the Implicit Function Theorem, the question this result addresses 
may be motivated by a simple example. Let S = R? + and let f: S — R be defined 
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by f(x, y) = xy. Pick any point (x, Y) € S, and consider the “level set” 
C(x. y = (@, y) E S| Sa y) = fy). 


If we now define the function k: R44 —> R by 4(y) = f(x, ¥)/y, we have 
SAOL ») = fy, yeRqy. 


Thus, the values of the x-variable on the level set C(x, Y) can be represented explicitly 
in terms of the values of the y-variable on this set, through the function A. 

In general, an exact form for the original function f may not be specified—for 
instance, we may only know that f is an increasing C! function on R?—so we may 
not be able to solve for A explicitly. The question arises whether at least an implicit 
representation of the function k would exist in such a case. 

The Implicit Function Theorem studies this problem in a general setting. That is, 
it looks at level sets of functions f from S C R” to IR“, where m > k, and asks 
when the values of some of the variables in the domain can be represented in terms 
of the others, on a given level set. Under very general conditions, it proves that at 
least a local representation is possible. i 

The statement of the theorem requires a little more notation. Given integers m > 1 
andn > 1, let a typical point in R”+” be denoted by (x, y), where x € R” and 
y e R". Fora C! function F mapping some subset of R” +” into R”, let DF yx, y) 
denote that portion of the derivative matrix DF(x, y) corresponding to the last n 
variables. Note that DF,(x, y) is ann x n matrix. DF (x, y) is defined similarly. 


Theorem 1.77 (Implicit Function Theorem) Let F: S c R™+" -> R" beac! 
function, where S is open. Let (x*, y*) be a point in S such that DFy(x*, y*) is 
invertible, and let F(x*, y*) = c. Then, there is a neighborhood U C R™ of x* and 
aC! function g: U —> R” such that (i) (x, g(x)) € S forall x € U, (ii) g(x") = y*, 
and (tii) F(x, g(x)) = c for all x € U. The derivative of g at any x € U may be 
obtained from the chain rule: 


Dg(x) = (DFy(x, yY} » DF (x, y). 


Proof See Rudin (1976, Theorem 9.28, p.224). o 


1.7 Exercises 


1. Show that equality obtains in the Cauchy~Schwartz inequality if and only if the 
vectors x and y are collinear (i.e., either x = ay for some a € R or y = bx for 
some 6 € R). (Note: This is a difficult question. Prove the result for unit vectors 
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first, i.e., for vectors x and y which satisfy x -x = y- y = 1. Then reduce the 
general case to this one.) 


2. Suppose a real-valued sequence {xx} satisfies x, > O for all k, and x is any limit 
point of the sequence {xg}. Under what further conditions on {x,} can we assert 
that x > 0? 

3. Let {xx}, {yk} be sequences in R” such that x, — x and yx —> y. For each k, let 
Zk = Xk + yk, and let wy = xx © yk. Show that zk > (x + y) and wk -> x- y. 

4. Give an example of a sequence {xg} which has exactly 7 limit points, where 
ne {i,2,...}. 

5. Give an example of a nonconvergent real-valued sequence {xx} such that {x4} 
contains at least one convergent subsequence, and every convergent subsequence 
of {xx} converges to the limit x = 0. 

6. Let x, y € R”. Show that for any sequences {xx}, {yx} such that xg — x and 
yk —> y, we have limy-.o0 d (xk, yk) = d(x, y), where d is the Euclidean metric 
on R”. 

7. Provide a formal proof that if a sequence {x4} converges to a limit x, then every 
subsequence of {x,} also converges to x. 

8. Show that lim sup, xk = — lim inf(—xx) for any sequence {xx} in R. 

9. Given two sequences {a,} and {bx} in R, show that 

lim sup(ax + bx) < lim sup ay + lim sup by 
k k k 
lim inf (a, + by) > lim inf ak + lim inf by. 
Give examples where the strict inequalities obtain. 

10. Let {ay} and {bg} be two real-valued sequences such that ag, bg > O for all k. 
What is the relationship between lim sup, (azbg) and (lim sup, a,)(lim sup, by)? 
What if ax, bg < O for all k? 

11. Le: isa} be a real-valued sequence, and œ € R. When is it the case that 
lim sup, (xag) = a lim sup, ax? 

12 


. In each of the following cases, give an example of a real-valued sequence {x;} 


meeting the stated properties: 


(a) lim sup, xx = lim infk xk = +00. 


(b) lim sup, xk = +00 and lim inf, xk = —oo. 
(c) lim sup; xk = +00 and lim inf, x, = 0. 
(d) lim sup, xx = O and lim inf, x, = — 00. 


or show that no such sequence can exist. 
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16. 


23, 


24. 
25. 
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. Find the lim sup and the lim inf of each of the following sequences: 


(a) xk =(—1)*, k= 1,2,... 

(b) xk = k(-1)*, k= 1,2,... 

(c) x, = (-1) +1/k, k= 1,2,... 

(d) x, = Lif k is odd, and x, = —k/2 if k is even, k = 1,2,... 


. Find the lim sup and lim inf of the following sequence: 1,1,2,1,2,3,1,2,3,4, ... 
. Let {xz} be a bounded sequence of real numbers. Let $ C R be the set which 


consists only of members of the sequence {xx}, i.e., x € Sif and only if x = x, 
for some k. What is the relationship between lim sup; x, and sup S$? 

Find the supremum, infimum, maximum, and minimum of the set X in each of 
the following cases: 

(a) X = {x € [0, 1) | x is irrational}. 

b) X = {x |x =1/n, n =1,2,...}. 

(c) X={x|x=1l—1/n, n=1,2,...}. 

(d) X = {x € [0,7] | sinx > 1/2}. 


. Prove that the “closed interval” [0,1] is, in fact, a closed set, ang that the “open 


interval” (0,1) is, in fact, an open set. Prove also that [0,1) and (0,1) are neither 
open nor closed. 


. Consider the set Z4} = {0, 1, 2,...} of nonnegative integers viewed as a subset 


of R. Is it closed, open, or neither? 


. Is R” viewed as a subset of itself an open set? Is it a closed set? Explain your 


answer. 


. Give an example of an open set in R that is not convex. 
. Give an example of a compact set in R that is not convex. 


. A set X C R” is said to be connected if there do not exist two nonempty open 


sets X; and X2 such that X; N X2 = Ø and Xı U X2 D X. Show that every 
connected set in R must be convex. Give an example of a connected set in R? 
that is not convex. 


Let X C R be an open set. Show that if a finite number of points {x), ..., x} 
are removed from X, the remaining set is still open. Is this true if we remove a 
countable infinity of points (x1, x2, ...} from X? 


Let An be the interval (0, 1/7). Provide a formal proof that OPL, A, is empty. 
Let B c R? be as follows: 


B= (or. ER: y=sin Z, x > 0} U(@,0)} 


Is B closed? open? bounded? compact? 


26. 


27. 


28. 


29. 


30. 


31. 


32. 
33. 


34. 
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Let A C R? be defined as 
A={(x, y)€R*:1 <x <2and y= x}. 

Is A open? bounded? compact? 
Let A = {-1,0) and B = (0,1). Examine whether each of the following 
statements is true or false. 
(a) AUB is compact; 
(b) A+ B={x+y:xe€ A, ye B} is compact, 
(c) AM B is compact. 


Let A C R” be given. Show that there is a smallest closed set, 4, which contains 
A, i.e., A is a closed set containing A and if C is a closed set containing A then 
AGACC.The set A is called the closure of A. 


Let A C IR?. Let B C R be defined by 
= {x e R | there is y € R such that (x, y) € A}. 


B is called the projection of A onto the x-axis. 
\ if A is a closed set in R?, is B necessarily a closed set in R? 
(b) 'f A is an open set in R?, is B necessarily an open set in R? 
Given two subsets A and B of R, recall that their Cartesian product A x B C R? 
is defined as 
Ax B={(a,b)|aeA,be B}. 


Give an example of a set X C R? that cannot be expressed as the Cartesian 
product of sets A, B C R. 


Give an example of an infinite collection of compact sets whose union is bounded 
but not compact. 


Let A = {1, 1/2, 1/3,..., 1/n,...} U {0}. Is A closed? Is it compact? 

Give an example of a countably infinite collection of bounded sets whose union 
is bounaed and an example where the union is unbounded. 

Say that a set Sin R” has the finite intersection property if the following condition 
holds: given an arbitrary collection of closed sets (Sa)ae4 in R”, it is the case 
that whenever every finite subcollection (Sa)ver has nonempty intersection with 
S, then the entire collection has nonempty intersection with S, i.e., whenever 


Neer UNSA ET! ae 
T e cd Paie a Sa ee Sg 


for every. finite-subset F of A, then we also have 


NeeASa N S E D. 2A 


m ÜLA 
( por toon Pot akat FAA 
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35. 


36. 


37. 


38. 


39. 


40. 


4l. 


42. 


43. 
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Show that the finite intersection property is equivalent to compactness, i.e., that 
a set S is compact if and only if it has the finite intersection property. 


Show that a set § C R is convex if, and only if, it is an interval, i.e., it is of 
the form [a, b], [a, b), (a, b}, or (a, b). (We do not piscinas the possibility that 
a = —œ0 and/or b = +00.) 


Show that if S c R” is convex (i.e., the convex combination of any two points 
in S is also in S), then the convex combination of any finite collection of points 
from S is also in S. 


Give an example of a bounded and convex set S C R such that sup S € S but 
inf S ¢ S. 


Let A and B be 2 x 2 matrices. Identify general conditions on the entries (a;;) and 
(bij) under which A B = BA. Using these conditions, find a numerical example 
of such matrices A and B. 


Let A bea given n xn nonsingular matrix, and let œ A be the matrix whose (i, /)-th 


elementis aa;;,i, j =1,...,”. What is the relationship, if anysbetween (aA! 
and A7~!? 


Find the rank of each of the following matrices by finding the size of the largest 
submatrix which has a nonzero determinant: 


3 6 4 8 18 2 -5 8 1 0 
A=|2 7 1 9 B=} 7 —4 C=] 13 3 0 2 
4 2 5 0 6 li 10 0 -6 2 


Using the first method described in subsection 1.3.6, find the determinant of each 
of the following three matrices. Verify that you get the same answers if you use 


. the second method. 


-1 3 2 i3 4 -4 4 n 
A=| 6 -2 3| B=| 0 4 1] C=] 3 3 -9 
7 10 0 -7 2 3 8 2 6 


Find the inverse of the following matrices: 


6 B 4 15 24 12 1 0 2 
A=j8 15 2 B=)10 16 10 C=| 0 8 3 
7 14 3 2 3 1 li 4 0 


Let S be a finite set. Show that any function f: S — R” is continuous on S. 
What if S is countably infinite? 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


Sl. 


1.7 Exercises i 7) 


Let f be a function from R” to R”! For B c R”, define f~'(B) by f7'(B) =. 
{x € R” | f(x) € B}. Show that for any subsets 4;, 42 of R” and 8), B2 of 
R”: 

(a) f(A, A2) S f(Ay) 9 f(A2). 

(b) f(A, U A2) = f(41) U f(A2). 

(c) f7!(By U B2) = FTB) U f7 (B). 

(d) f-'(BS) = ST BDI. 

(e) f7'(By N B) = f7'(Bi) A f7"(B)). 

(f) Ai E STA). 

(8) FTB) S Bi. 

Let f : R” — R be continuous at a point p € R”. Assume f(p) > 0. Show 
that there is an open ball B C R” such that p € B, and for all x € B, we have 


f(x) > 0. 


Suppose /: R” — R is a continuous function. Show that the set 
{x € R"| f(x) =0) 


is a closed set. 


Let f:R — R be defined by 
1 ifO0<x <1 
f(x) = 


0 otherwise . 
Find an open set O such that f~'(Q) is not open and find a closed set C such 
that f~!(C) is not closed. 


Give an example of a function f:R — R which is continuous at exactly two 
points (say, at 0 and 1), or show that no such function can exist. 


Show that it is possible for two functions f:R — R and g:R -> R to be 


discontinuous, but for their product f - g to be continuous. What about their 
composition f o g? 


Let f: R — R be a function which satisfies 
Sixty) = V/V) forallx,yeR. 


Show that if f is continuous at x = 0, then it is continuous at every point of 
R. Also show that if f vanishes at a single point of R, then f vanishes at every 
point of R. 


Let f : R} > R be defined by 


Jasko e #20 
“Tami 220: 
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32. 


53. 


54. 


55. 


56. 
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Show that f is continuous at 0. 


Let D be the unit square [0, 1] x (0, 1] in R2. For (s,t) € D, let f(s, t) be 
defined by 


f(s,0) =0, forall s € [0, 1], 


and fort > 0, 


0 s & (t, 1). 


(Drawing a picture of f for a fixed ¢ will help.) Show that f is a separately 
continuous function, i.e., for each fixed value of t, f is continuous as a function 
of s, and for each fixed value of s, f is continuous in t. Show also that f is not 
jointly continuous in s and t, i.e., show that there exists a point (s, 7) € D anda 
sequence (Sn, fn ) in D converging to (s, t) such that limn— oo flsn, th) # f(s, 0). 


Let f : R — R be defined as 


x if x is irrational 
I(x) = i 


—x  ifx isrational. 


Show that f is continuous at 1/2 but discontinuous elsewhere. 


Let f: IR" — R and g:R — R be continuous functions. Define h: R” —> R by 
h(x) = gi f(x)]. Show that A is continuous. Is it possible for h to be continuous 
even if f and g are not? 


Show that if a function f:R — R satisfies 
(f(x) = SI < Mix- yl? 


for some fixed M > Oanda > 1, then f is a constant function, i.e., f(x) is 
identically equal to some real number b at all x € R. 


Let f: R? — R be defined by f(0, 0) = 0, and for (x, y) # (0,0), 
xy 


fa,» = -= 
yx? +y? 


Show that the two-sided directional derivative of f evaluated at (x, y) = (0,0) 
exists in all directions h € RŽ, but that f is not differentiable at (0, 0). 
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57. Let f:IR? — R be defined by f<0;0) = 0, and for (x, y) # (0,0), 


2 2 
_ xyQxt y") 
SQ, y) aia x2 + y? . 
Show that the cross-partials 3? f(x, y)/3x3y and 3? f(x, y)/Ayax exist at all 
(x,y) € R2, but that these partials are not continuous at (0, 0). Show also that 
a? a? 
f dS o, 0). 
ðxðy ðyðx 
58. Show that ann x n symmetric matrix A is a positive definite matrix if and only if 


—A is a negative definite matrix. (— A refers to the matrix whose (i, j)-th entry 
is aij.) 


(0,0) # 


59. Prove the following statement or provide a counterexample to show it is false: if 
A is a positive definite matrix, then 47! is a negative definite matrix. 


60. Give an example of matrices A and B which are each negative semidefinite, but 
not negative definite, and which are such that 4 + B is negative definite. 


61. Isit possible for a symmetric matrix A to be simultaneously negative semidefinite 
and positive semidefinite? If yes, give an example. If not, provide a proof. 


62. Examine the definiteness or semidefiniteness of the following quadratic forms: 


0 0 1 1 2 3 

A={0 1 0 A=|2 4 6 

1 0 0 3 6 0 
1 0 1 -1 2 -~l 
A=j|O0 1 0 A=| 2 —4 2 
1 0 1 —1 2 ~l 


63. Find the hessians D? f of each of the following functions. Evaluate the hessians 
at the specified points, and examine if the hessian is positive definite, negative 
definite, positive semidefinite, negative semidefinite, or indefinite: 

(a) f:R? > R, Jfœ=x? + ./x2, atx = (1,1). 

(b) f:IR? > R, f(x) = (x1x2)!"/?, at an arbitrary point x € R? ,. 

(c) RR? > R, S(x) = (x1x2)*, at an arbitrary point x € R2 |. 

(d FR| >R, f(x) = fat Jo + 53, at x = (2, 2,2). 

(ce) f:RL > R, f(x) = Jxpxex3, at x = (2, 2,2). 

() SAR} > R, f(x) = xx + 0x3 + ax, atx = (1,1,1). 

(g) f: RÌ —> R, f(x) = axı + bx2 + cx3 for some constants a,b,c € R, at 
x = (2,2,2). 
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Optimization in R” 


This chapter constitutes the starting point of our investigation into optimization the- 
ory. Sections 2.1 and 2.2 introduce the notation we use to represent abstract optimiza- 
tion problems and their solutions. Section 2.3 then describes a number of examples 
of optimization problems drawn from economics and its allied disciplines, which are 
invoked at various points throughout the book to illustrate the use of the techniques 
we develop to identify and characterize solutions to optimization problems. Finally, 
Sections 2.4 and 2.5 describe the chief questions of interest that we examine over the 
next several chapters, and provide a roadmap for the rest of the book. 


2.1 Optimization Problems in R” 


An optimization problem in R”, or simply an optimization problem, is one where 
the values of a given function f: R” — R are to be maximized or minimized over a 
given set D C IR". The function f is called the objective function, and the set D the 
constraint set. Notationally, we will represent these problems by 


Maximize f(x) subject to x € D, 
and 
Minimize f(x) subject to x €e D, 
respectively. Alternatively, and more compactly, we shall also write 
max{ f(x) |x € D}, 
and 


min{ f(x) |x € D}. 
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Problems of the first sort are termedmaximization problems and those of the second | 
sort are called minimization problems. A solution to the problem max{ f(x) | x € D} 
is a point x in D such that 


f(x) = f(y) forall ye D. 


We will say in this case that f attains a maximum on D at x, and also refer to x as a 
maximizer of f on D. 

Similarly,! a solution to the problem min{ f(x) | x € D} is a point z in D such 
that 


f(z) < f(y) forall ye D. 


We say in this case that f attains a minimum on D at z, and also refer to z as a 
minimizer of f on D. 
The set of attainable values of f on D, denoted f(D), is defined by 


f(D) = {w e R | there is x € D such that f(x) = w}. 


We will also refer to f(D) as the image of D under f. Observe that f attains a 
maximum on P (at some x) if and only if the set of real numbers f(D) has a well- 
defined maximum, while f attains a minimum on D (at some z) if and only if f(D) 
has a well-defined minimum. (This is simply a restatement of the definitions.) 
~ The following simple examples reveal two important points: first, that in a given 
maximization problem, a solution may fail to exist (that is, the problem may have no 
solution at all), and, second, that even if a solution does exist, it need not necessarily 
be unique (that is, there could exist more than one solution). Similar statements 
obviously also hold for minimization problems.” 


Example 2.1 Let D = Ry and f(x) = x forx e D. Then, f(D) = Ry and 
sup f(D) = +00, so the problem max{ f(x)|x € D} has no solution. g 


Example 2.2 Let D = [0, 1] and let f(x) = x(1 — x) forx € D. Then, the problem 
of maximizing / on D has exactly one solution, namely the point x = 1/2. CJ 


Example 2.3 Let D = [-1, I} and f(x) = x? for x € D. The problem of maxi- 
mizing / on D now has two solutions: x = —1 and x = 1. O 


In the sequ *!, therefore, we will not talk of the solution of a given optimization 
problem, but ‘f a set of solutions of the problem, with the understanding that this 


‘It frequently happens through this text that definitions or results stated for maximization problems have an 
exact analog in the context of minimization problems. Rather than spell this analog out on each occasion, 
we shall often leave it to the reader to fill in the missing details. 

Žin this context, see Theorem 2.4 below. 
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set could, in general, be empty. The set of all maximizers of f on D will be denoted 
arg max{ f(x) | x € D}: 


arg max( f(x) |x €D} = {x €D| f(x) > f(y) forall y e D}. 


The set arg min{ f(x) | x € D} of minimizers of f on D is defined analogously. 

We close this section with two elementary, but important, observations, which 
we state in the form of theorems for ease of future reference. The first shows that 
every maximization problem may be represented as a minimization problem, and 
vice versa. The second identifies a transformation of the optimization problem under 
which the solution set remains unaffected. 


Theorem 2.4 Let — f denote the function whose value at any x is — f(x). Then x is 
a maximum of f on D ifand only if x is a minimum of — f on D; and z is a minimum 
of f on D if and only if z is a maximum of — f on D. 


Proof The point x maximizes f over D if and only if f(x) > f(y) forall y € D, 
while x minimizes — f over D if and only if — f(x) < — f(y) for all y € D. 
Since f(x) > f(y) is the same as — f(x) < — f(y), the first part of the theorem is 
proved. The second part of the theorem follows from the first simply by noting that 


-N= f. 5 


Theorem 2.5 Lety: R > R be a strictly increasing function, that is, a function 
such that 


x > y implies p(x) > (y). 


Then x is a maximum of f on D if and only if x is also a maximum of the composition 
po fon D; and zis a minimum of f on D if and only if z is also a minimum of po f 
on D. 


Remark As will be evident from the proof, it suffices that ọ be a strictly increasing 
function on just the set f(D), i.e., that p only satisfy (21) > (z2) for all z1, z2 € 
S(D) with z; > z2. o 


Proof We deal with the maximization problem here; the minimization problem is 
left as an exercise. Suppose first that x maximizes f over D. Pick any y € D. Then, 
fœ) = f(y), and since ¢ is strictly increasing, o( f (x)) => (f (y)). Since y € D 
was arbitrary, this inequality holds for all y € D, which states precisely that x is a 
maximum of y o f on D. 

Now suppose that x maximizes go f on D, sog( f (x)) = (f(y) forall y e D.If 
x did not also maximize f on D, there would exist y* € D such that f(y*) > f(x). 
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Since ¢ is a strictly increasing function, it follows that e( f(y*)) > e(/(x)), so x 
does not maximize g o f over D, a contradiction, completing the proof. O 


2.2 Optimization Problems in Parametric Form 


It is often the case that optimization problems are presented in what is called pa- 
rametric form, that is, the objective function and/or the feasible set depend on the 
value of a parameter @ in some set of feasible parameter values ©. Indeed, all of the 
examples of optimization problems presented in Section 2.3 below belong to this 
class. Although we do not study parametrized optimization problems until Chapter 
9, it is useful to introduce at this point the notation we shall use to represent these 
problems in the abstract. We will denote by © the set of all parameters of interest, 
with @ denoting a typical parameter configuration in ©. Given a particular value 
0 € O, the objective function and the feasible set of the optimization problem under 
9 will be denoted /(-, 0) and D(6), respectively. Thus, if the optimization problem 
is a maximization problem, it will be written as 


max{ f(x, 4) | x € D(®)}, 
while the corresponding minimization problem will be denoted 
min{ f(x, 6) |x € D(@)}. 


Notice that although our notation is general enough to allow both f and D to depend 
in a non-trivial way on @, it also admits as a special case the situation where 6 affects 
the shape of only the objective function or only the constraint set. More generally, it 
allows for the possibility that @ is a vector of parameters, some of which affect only 
the objective function, and others only the constraint set. 

The set of maximizers of f(-,0) on D(@) will typically depend in a non-trivial 
way on 0. It would be consistent with our notation for the unparametrized case to 
denote this set by 


arg max{ f(x, 8) | x € D(0)}. 


However, the set of maximizers is itself an object of considerable interest in parame- 
trized optimization problems (see Section 2.4 below), and a less cumbersome notation 


will be valuable. We shall use D*(@) to denote this set: 
L ~(6) = argmax{ f(x, 9) | x € D(@)} 
{x € D) | f(x.0) > f(z, 4) for all z € D(#)}. 


tl 


I 


Similarly, we let D,(@) denote the set of minimizers of f(-,@) on D(@). 
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Lastly, we will denote by f* (0) and /,(@), respectively, the supremum and infimum 
of the objective function given the parameter configuration @. That is, if 


f(DO)) ={yEeR| y= f(x, 4) for some x € D(O)} 
represents the set of attainable values of f(-, 6) on D(@), then 


f* (0) = sup f(D@)) 
and 
fx (O) = inf f(D(8)). 


We call f* the value function of the problem max{ f(x, 8) { x € D(@)}, and the 
function fẹ the value function of the minimization problem min{ f(x, 8) | x € D(@)}. 
We shall, in the sequel, refer to these functions as the maximized value function and 
the minimized value function, respectively.> Observe that if D*(@) is nonempty for 
some 6, then we must have sup f(D(@)) = f(x*,0) = f(x’, 0) for any x* and x’ 
in D*(0), so in this case, we may also write 


S*() = f(x*, 0) for any x* € D* (0). 


A similar remark is valid for f.(@) when D,(@) is nonempty. 


2.3 Optimization Problems: Some Examples 


Economics and its allied disciplines are rich in decision problems that can be cast 
as optimization problems. This section describes simple versions of several of these 
problems. Many of these examples are invoked elsewhere in the book to illustrate the 
use of the techniques we shall develop. All of the examples presented in this section 
are also optimization problems in parametric form. This is pointed out explicitly 
in some cases, but in others we leave it to the reader to identify the parameters of 
interest. 


2.3.1 Utility Maximization 
A basic, but typical, model in consumer theory concerns a single agent who consumes 
n commodities in nonnegative quantities. Her utility from consuming x; > 0 units 
of commodity i (i = 1,...,m) is given by u(x},..., Xn), where u:R4 — R is 
the agent’s utility function. The agent has an income of 7 > 0, and faces the price 
vector p = (P1, .--, Pn), where p; > Odenotes the unit price of the i-th commodity, 


There isa slight abuse of terminology here. Since f*(8) is defined through the supremum, the “maximized 
value function” could be well defined even if a maximum does not exist. 
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Her budget set (i.c., the set of feasible or affordable consumption bundles, given her . 
income / and the prices p) is denoted B(p, /), and is given by 


Bip, D = {xe RL lp xsl. 


The agent’s objective is to maximize the level of her utility over the set of affordable 
commesity bundles, i.e., to solve: 


Maximize u(x) subject to x € B(p, /). 


The utility maximization problem is a typical example of an optimization problem 
in parametric form. When the price vector p and/or the income / change, so too does 
the feasible set? B(p, /), and therefore, the underlying optimization problem. Thus, 
pand I parametrize this problem. Since prices and income are usually restricted to 
taking on nonnegative values, the range of possible values for these parameters is 
Ri x Ry. 

Of course, depending on the purpose of the investigation, p and / may not be 
the only—or even the primary—parameters of interest. For instance, we may wish 


to study the problem when the utility function is restricted to being in the Cobb- 
Douglas class: 


w 


a Un 


U(X... Xn) = Xp xR", a; > 0 forall ¢. 


In particular, our objective may be to examine how the solutions to the problem 
change as the vector of “weights” a = (a1, ..., an) changes, for a given level of 
prices and income. The vector a would then constitute the only parameter of interest. 


2.3.2 Expenditure Minimization 


The expenditure minimization p.cblem represents the flip side of utility maximiza- 
tion. The problem takes as given a price vector p € R}. and asks: what is the 
minimum amount of income needed to give a utility-maximizing consumer a utility 
level of at least z, where ù is some fixed utility level (possibly corresponding to some 
given commodity bundle). Thus, the constraint set in this problem is the set 


Xu) = {x e R} | u(x) > a}. 
The objectiv: is to solve: 
Minimize p - x subject to x € X(m). 


4Well, almost. The feasible set will change as long as prices and income do not change in the same proportion 
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2.3.3 Profit Maximization 


Producer theory in economics studies the decision-making processes of firms. A 
canonical, if simple, model involves a firm which produces a single output using n 
inputs through the production relationship y = g(x}, ..., Xn), where x; denotes the 
quantity of the i-th input used in the production process, y is the resultant output, 
and g: R} — R+ is the firm’s production function or technology. The unit price of 
input í is w; > 0. When the firm produces y units of output, the unit price it can 
obtain is given by p(y), where p:R4 — R4 is the market (inverse-)demand curve. 
Inputs may be used in any nonnegative quantities. Thus, the set of feasible input 
combinations is R}. Letting w denote the input price vector (w1,..., Wa), and x 
the input vector (x1, ..., Xn), the firm’s objective is to choose an input mix that will 
maximize its level of profits, that is, which solves 


Maximize p(g(x))g(x) — w-x subject to x € Ri. 


The profit maximization problem is another typical example of an optimization 
problem in parametric form. The input price vector w, which affects the shape of 
the objective function, evidently parametrizes the problem, but there could also be 
other parameters of interest, such as those affecting the demand curve p(-), or the 
technology g(-). For example, in the case of a competitive firm, we have p(y) = p 
for all y > 0, where p > Ois given. In this case, the value of p is obviously important 
for the objective function; p, therefore, constitutes a parameter of the problem. 


2.3.4 Cost Minimization 
The cost minimization problem for a firm is similar to the expenditure minimization 
problem fora consumer. The aim is to identify the mix of inputs that will minimize the 
cost of producing at least Y units of output, given the production function g: R} > 
R+ and the input price vector w = (w),..., Wr) € R}. Thus, the set of feasible 
input choices is 


F(y) = {x € R41 g(x) = yh 
and the problem is to solve 
Minimize w - x subject to x € F(y). 


Implicitly, this formulation of the cost-minimization problem presumes that a “free 
disposal” condition holds, i.e., that any excess amount produced may be costlessly 
discarded. If there is a nonzero cost of disposal, this should also be built into the 
objective function. 


5]f the firm operates in a competitive environment, we usually assume that p(y) = p forall y > 0 for some 
fixed p, i.e., that the firm has no influence over market prices. 
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2.3.5 Consumption-Leisure Choice 


A simple model of the labor-supply decision process of households may be obtained 
from the utility maximization problem by making the income level / also an object of 
choice. That is, it is assumed that the agent begins with an endowment of H > 0 units 
of “time,” and faces a wage rate of w > 0. By choosing to work for L units of time, 
the agent can earn an income of wL; thus, the maximum possible income is wH, 
while the minimum is zero. The agent also gets utility from leisure, i.e., from the time 
not spent working. Thus, the agent’s utility function takes the form u(x),..... vn), 
where x; represents the consumption of commodity i, and / = H — L is the amount 
of leisure enjoyed. Letting p = (p),..., Pn) € R} denote the price vector for the 
n consumption goods, the set of feasible leisure-consumption bundles for the agent 
is, therefore, given by: 


F(p,w) = (a, De RY | px < wi), 1< H}. 
The agent’s maximization problem may now be represented as: 


Maximize u(x, l) subject to (x, 1) € F(p, w). 


2.3.6 Portfolio Choice 


The portfolio selection problem in finance involves a single agent who faces S pos- 
sible states of the world, one of which will be realized as the “true state.” The agent 
has a utility function U: RS. — R defined across these states: if the income the agent 


will have in state s is ys, s = 1,..., S, his utility is given by U(y),..., ys). 
If the true state of the world is revealed as s, the agent will have an endowment of 
ws € Rx; this represents the agent’s initial income in state s. Let w = (wj,..., 5) 


represent the initial endowment vector of the agent. 

Before the true state is revealed, the agent may shift income from one state to 
another by buying and selling securities. There are N securities available. One unit 
of security į costs p; > O and pays an income of zis if state s occurs. 

A portfolio is a vector of securities @ = (¢1,...,@N) € R^. If the agent selects 
the portfolio ġ, then his income from the securities in state s is Eii i Zis) thus, his 
overall income in state s is 


N 
VE) = ws +} izis. 
i=l 
Letting y(q ) = (yı ($), ..., ¥s()), it follows that the agent's utility from choosing 
the portfolio ¢ is U(y(@)). 
A portfolio ¢ is affordable if the amount made from the sales of securities is 
sufficient to cover the purchases of securities, i.e., if p-@ <0. It leads to a feasible 
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consumption policy if income is nonnegative in all states, i.e., if y,(@) = O for all 
s. A portfolio is feasible if it is both affordable and leads to a feasible consumption 
policy. Thus, the set of all feasible portfolios is 


(p, w) = {ġ € R" | p- ġ < Oand y(ġ) > O}. 


The agent’s problem is to select the portfolio that maximizes his utility from among 
the set of all feasible portfolios: 


Maximize U(y(@)) subject to ġ € P(p, w). 


As with the earlier examples, the portfolio selection problem is also one in para- 
metric form. In this case, the parameters of interest include the initial endowment 
vector w, the security price vector p, and the security payoffs (z; j). Of course, once 
again, these need not be the only parameters of interest. For instance, the agent’s 
utility function may be of the expected utility form 


S 
Uyi. ys) = J miui), 
i=l 
where 7r; represents the (agent’s subjective) probability of state i occurring, and u(y;) 
is the utility the agent obtains from having an income of y; in state i. In this case, the 
probability vector x = (7, ..., 2g) also forms part of the parameters of interest; so 
too may any parameters that affect the form of u(-). 


2.3.7 Identifying Pareto Optima 


The notion of a Pareto optimum is fundamental to many areas of economics. An 
economic outcome involving many agents is said to be Pareto optimal if it is not 
possible to readjust the outcome in a manner that makes some agent better off, 
without making at least one of the other agents worse off. 

The identification and characterization of Pareto optimal outcomes is a matter of 
considerable interest in many settings. A simple example involves allocating a given 
total supply of resource w € R} between two agents. Agent i’s utility from receiving 
the allocation x; € R} is u;(x;), where uj: R}. > R is agent i’s utility function. An 
allocation Cxi, x2) is feasible if x1, x2 > Oand xi + x2 < w. Let P(qw) be the set of 
all feasible outcomes: 


F(w) = {(x1, x2) € RY x RY [x1 +2 So}. 


A feasible allocation is Pareto optimal if there does not exist another feasible allo- 
cation (x}, x2) such that 


uj(x;) > uj), fori = 1,2, 


with strict inequality for at least one i. 
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There are several alternative methods for identifying a Pareto optimal allocation. 
One is to use a “weighted” utility function. Pick any a € (0, 1), and let U (x1, X2, @) 
be defined by 


Ui, x2, œ) = æu (xy) + (1 —æ)uz(x). 
Now consider the optimization problem 
Maximize U (xi, x2, œ) subject to (xy, x2) € F(w). 


Observe that the weight œ € (0, 1) is a parameter of this optimization problem, 
as is the total endowment vector œ. It is not very hard to see that every solution 
(xf, x2) to this weighted utility maximization problem must define a Pareto optimal 
outcome. For, if (xj, x3) were a solution, but were not Pareto optimal, there would 
exist (x}, x4) € F(w) such that uy (x;) > u;(x7) with at least one strict inequality. 
But this would imply 


U(x4,x9,@) = ay (xy) + (1 — w)ur(x5) 
au; (xt) + (1 — @)u2(xz) 


U(x], x3,@), 


ll 


V 


` which contradicts the fact that (xf, x3) is a maximum of the weighted utility maxi- 
mization problem. 


2.3.8 Optimal Provision of Public Goods 


As opposed to private consumption goods, a public good in economics is defined 
as a good which has the characteristic that one agent’s consumption of the good 
does not affect other agents’ ability to also consume it. A central problem in public 
finance concerns the optimal provision of public goods. A simple version of this 
problem involves a setting with n agents, a single public good, and a single private 
good (“money”). The initial total endowment of the private good is w > 0, and that 
of the public good is zero. The private good can either be used in the production 
of the public good, or it can be allocated to the agents for consumption by them. 
If x > O is the amount of the private good used in production of the public good, 
then the amount of the public good produced is A(x), where A: Ry > R4. Letting 
x; denote the allocation of the private good to agent i, the set of feasible combi- 


nations of private consumption and public good provision that can be achieved is, 
u:2refore, 


P(w) = (XI1,--., Xn Y) E RIH | there is x € R4 such that 


y=h(x)andx +x, + xn < o}. 
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The utility of any agent į depends on her own consumption x; of the private good, 
and on the level y of provision of the public good, and is denoted u;(x;, y). A so- 
cial welfare function is a function W(u,,...,un,) that takes as its arguments the 
utility levels of the agents. (For instance, W could be a weighted utility function 
aug +--++@,uU, as in the previous subsection.) The public goods provision prob- 
lem typically takes as given some social welfare function W, and aims to find the 
combination of private consumption levels (x), ...,%,) and public good provision 
y that is feasible and that maximizes W, i.e., that solves 


Maximize W[u (x1, Y),..-,Un(X%n, y)] subject to (xy, ..., Xn, Y) E P(w). 


2.3.9 Optimal Commodity Taxation 


A second question of interest in public finance is determining the optimal rate at 
which different commodities should be taxed to raise a given amount of money. It is 
taken for granted that consumers will react to tax changes in a manner that maximizes 
their utility under the new tax regime. Thus, the study of optimal taxation begins with 
the utility maximization problem of a typical consumer. 

There are n goods in the model, indexed by i. The (pre-tax) prices of the n com- 
modities are given by the vector p = (pı, ..., Pn). Given a tax per unit of t; on 
the /-th commodity, the gross price vector faced by the consumer is p + t, where 
T = (T1, ... Tn). Thus, if / represents the consumer’s income, and u: R3. — R the 
consumer’s utility function, the consumer solves 


Maximize u(x) subject to x € B(p + t, D) = {x e R} | (p+r) x < 1). 


Assume that a unique solution exists to this problem for each choice of values 
of the parameter vector (p, t, / ),© and denote this solution by x(p + t, 1). That is, 
x(p+t, I) is the commodity bundle that maximizes u(x) over B(p + t, /). Further, 
let u(p + t, Z) be the maximized value of the utility function given (p, t, /), i.e., 
viptr/)=ux(ptt, D). 

Now suppose the government has a given revenue requirement of R. The optimal 
commodity taxation problem is to determine the tax vector r that will (a) raise at 
least the required amount of money, and (b) make the consumer best-off amongst 
all plans that will also raise at least R in tax revenue, That is, suppose that the tax 
authority chooses the tax vector t. Then, the consumer will choose the commodity 
bundle x(p + t, Z), so this will result in a total tax revenue of t- x(p + t, 1). Thus, 
the set of feasible tax vectors is 


T(p. D) = (te R" | t-x(pt+rt, 1) > R}. 


élt is possible to describe a suitable set of regularity conditions on u under which such existence and 
uniqueness may be guaranteed, but this is peripheral to our purposes here. 
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The tax rate of t gives the optimally reacting consumer a maximized uulty of. 
v(p +t, I). Thus, the optimal commodity taxation problem is to solve 


Maximize u(p +t, /) subjectto r e T(p, /). 


2.4 The Objectives of Optimization Theory 


Optimization theory has two main objectives. The first is to identify a set of conditions 
on f and D under which the existence of solutions to optimization problems is 
guaranteed. An important advantage to be gained from having such conditions is 
that the case-by-case verification of existence of solutions in applications can be 
avoided. In particular, given a parametrized family of optimization problems, it will 
be possible to identify restrictions on parameter values which ensure existence of 
solutions. On the other hand, to be of use in a wide variety of applications, it is 
important that the identified conditions be as general as possible. This makes the 
“existence question” non-trivial. 

The second objective of optimization theory is more detailed and open-ended: it 
lies in obtaining a characterization of the set of optimal points. Broad categories of 
questions of interest here include the following: 


l. The identification of conditions that every solution to an optimization problem 
must satisfy, that is, of conditions that are necessary for an optimum point. 

2. The identification of conditions such that any point that meets these conditions 
is a solution, that is, of conditions that are sufficient to identify a point as being 
optimal. 

3. The identification of conditions that ensure only a single solution exists to a 
given optimization problem, that is, of conditions that guarantee uniqueness of 
solutions. 

4. A general theory of parametric variation ina parametrized family of opumization 
problems. For instance, given a family of optimization problems of the form 
max{ f(x, 4) | x € D(@)}, we are interested in 


(a) the identification of conditions under which the solution set D*(@) varies 
“continuously” with 6, that is, conditions which give rise to parametric 
continuity; and 

(b) in problems where the parameters @ and actions x have a natural ordering.” 
the identification of conditions under which an increase in the value of © 
also leads to an increase in the value of the optimal action x*, that is, of 
com ‘.ions that lead to parametric monotonicity. 


7For instance, @ could be a real number representing a variable such as “income” or “price,” and x could 
represent consumption levels. 
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These questions, and related ones, form the focus of the next several chapters of 
this book. As is perhaps to be expected, further hypotheses on the structure of the 
underlying optimization problems are required before many of them can be tackled. 
For instance, the notion of differentiability plays a central role in the identification 
of necessary conditions for optima, while the key concept in obtaining sufficient 
conditions is convexity. The next section elaborates further on this point. 


2.5 A Roadmap 


Chapter 3 begins our study of optimization with an examination of the fundamental 
question of existence. The main result of this chapter is the Weierstrass Theorem, 
which provides a general set of conditions under which optimization problems are 
guaranteed to have both maxima and minima. We then describe the use of this result 
in applications, and, in particular, illustrate how it could come in handy even in 
problems where the given structure violates the conditions of the theorem. 

Chapters 4 through 6 turn to a study of necessary conditions for optima using 
differentiability assumptions on the underlying problem. Since differentiability is 
only a local property, the results of these chapters all pertain only’to local optima, 
i.c., of points that are optima in some neighborhood, but not necessarily on all of the 
constraint set. Formally, we say that x is a local maximum of f on D if there isr > 0 
such that x is a maximum of f on the set B(x,r) N D. Local minima are defined 
similarly. Of course, every optimum of f on D is also a local optimum of f on D. 
To distinguish between points that are only local optima and those that are optima 
on all of D, we will refer to the latter as global optima. 

Chapter 4 looks at the simplest case, where local optima occur in the interior of 
the feasible set D, i.e., at a point where it is possible to move away from the optimum 
in any direction by at least a small amount without leaving D. The main results here 
pertain to necessary conditions that must be met by the derivatives of the objective 
function at such points. We also provide sufficient conditions on the derivatives of 
the objective function that identify specific points as being local optima. 

Chapters 5 and 6 examine the case where some or all of the constraints in D 
could matter at an optimal point. Chapter 5 focuses on the classical case where 
D is specified implicitly using equality constraints, i.c., where there are functions 
gi R” — R,i=1,...,k, such that 


D = {x ER" | g(x) =0, i =1,...,4). 


The main result of this chapter is the Theorem of Lagrange, which describes necessary 
conditions that must be met at all local optima of such problems. Analogous to the 
results for interior optima, we also describe conditions that are sufficient to identify 
specific points as being local optima. The remainder of the chapter focusses on 


2.5 A Roadmap 87 


using the theory in application’. We describe a “cookbook” procedure for using the- 
necessary conditions of the Theorem of Lagrange in finding solutions to equality- 
constrained optimization problems. The procedure will not always work, since the 
conditions of the theorem are only necessary, and not also sufficient. Nonetheless, 
the procedure is quite successful in practice, and the chapter explores why this is the 
case, and, importantly, when the procedure could fail. 

Chapter 6 moves on to inequality-constrained optimization problems, that is. 
where the constraint set is specified using functions Aj:R” > R,i =1,...,/,as 


D = (x eR" |h >0, i =1,..., 7}. 


The centerpiece of this chapter is the Theorem of Kuhn and Tucker, which describes 
necessary conditions for optima in such problems. The remainder of the analysis in 
this chapter is, in some sense, isomorphic to that of Chapter 5. We discuss a cookbook 
procedure for using the Theorem of Kuhn and Tucker in applications. As with the 
cookbook procedure for solving equality-constrained problems, this one is also not 
guaranteed to be successful, and once again, for the same reason (namely, because 
the conditions of the relevant theorem are only necessary, and not also sufficient.) 
Nonetheless, this procedure also works well in practice, and the chapter examines 
when the procedure will definitely succeed and when it could fail. 

Chapters 7 and 8 turn to a study of sufficient conditions, that is, conditions that, 
when met, will identify specific points as being global optima of given optimization 
problems. Chapter 7 presents the notion of convexity, that is, of convex sets, and of 
concave and convex functions, and examines the continuity, differentiability, and cur- 
vature properties of such functions. It also provides easy-to-use tests for identifying 
concave and convex functions in practice. The most important results of this chapter. 
however, pertain to the use of convexity in optimization theory. We show that the 
same first-order conditions that were proved, in earlier chapters, to be necessary for 
local optima, are also, under convexity conditions, sufficient for global optima. 

Although these results are very strong, the assumption of convexity is not an un- 
restrictive one. In Chapter 8, therefore, a weakening of this condition, called quasi- 
convexity, is studied. The weakening turns out to be substantial; quasi-concave and 
quasi-convex functions fail to possess many of the strong properties that characterize 
concave and convex functions. However, it is again possible to give tests for identi- 
fying such functions in practice. This is especially useful because our main result of 
this chapter shows that, under quasi-convexity, the first-order necessary conditions 
for local optima are “almost” sufficient for global optima; that is, they are sufficient 
whenever some mild additional regularity conditions are met. 

Chapters ^ and 10 address questions that arise from the study of parametric families 
of optimizat on problems. Chapter 9 examines the issue of parametric continuity: 
under what conditions on the primitives will the solutions to such problems vary 
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continuously with the parameters 6? Of course, continuity in the solutions will not 
obtain without continuity in the primitives, and this requires a notion of continuity 
for a map such as D(A) which takes points 0 € © into sets D(@) C R”. The study of 
such point-to-set maps, called correspondences is the starting point of this chapter. 
With a satisfactory notion of continuity for correspondences in hand, Chapter 9 
presents one of the major results of optimization theory, the Maximum Theorem. 
Roughly speaking, the Maximum Theorem states that the continuity properties of the 
primitives are inherited, but not in their entirety, by the solutions. A second result, 
the Maximum Theorem under Convexity, studies the impact of adding convexity 
assumptions on the primitives. 

Chapter 10 looks at the problem of parametric monotonicity. It introduces the 
key notion of supermodularity of a function, which has become, in recent years, a 
valuable tool for the study of incomplete-information problems in economics. It is 
shown that, under some regularity conditions on the problem, supermodularity of the 
objective function suffices to yield monotonicity of optimal actions in the parameter. 

Chapters 11 and 12 introduce the reader to the field of dynamic programming, i.c., 
of multiperiod decision problems, in which the decisions taken inany period affect 
the decision-making environment in all future periods. Chapter 11 studies dynamic 
programming problems with finite horizons. It is shown that under mild continuity 
conditions, an optimal strategy exists in such problems. More importantly, it is shown 
that the optimal strategies can be recovered by the process of backwards induction, 
that is, by begining in the last period of the problem and working backwards to the 
first period. 

Chapter 12 looks at dynamic programming problems with an infinite horizon. Such 
problems are more complex than the corresponding finite-horizon case, since they 
lack a “last” period. The key to finding a solution tums out to be the Bellman Equa- 
tion, which is essentially a statement of dynamic consistency. Using this equation, 
we show that a solution to the problem can be shown to exist under just continuity 
and boundedness conditions; and the solution itself obtained using a rather simple 
procedure. A detailed presentation of the neoclassical model of economic growth 
illustrates the use of these results; more importantly, it also demonstrates how con- 
vexity conditions can be worked into dynamic programming problems to obtain a 
sharp characterization of the solution. 


2.6 Exercises 


1. Is Theorem 2.5 valid if ọ is only required to be nondecreasing (i.e., if x > y 
implies y(x) > p(y)) instead of strictly increasing? Why or why not? 


2. Give an example of an optimization problem with an infinite number of solutions. 
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. Let D = [0, 1}. Describe the set f(D) in each of the following cases, and identify 
sup f(D) and inf f(D). In which cases does f attain its supremum? What about 
its infimum? 

(a) f(x) = l +x foralla e PD. 

(b) f(x) = 1, ifx < 1/2, and f(x) = 2x otherwise. 

(c) f(x) =x, ifx < l,and f(1) =2. 

(d) f(0) = 1, f(1) = 0, and f(x) = 3x for x e (0, 1). 

. Let D = [0, 1]. Suppose f:D — R is increasing on D, i.e., for x, y e Dy if 
x > y, then f(x) > f(y). [Note that f is not assumed to be continuous on D.) 
Is f(D) a compact set? Prove your answer, or provide a counterexample. 


. Find a function f:R — R and a collection of sets S C R, k = 1,2,3...., 
such that f attains a maximum on each Sx, but not on NPE , Sk- 


6. Give an example of a function f: (0, 1} > R such that /([0, 1]) is an open set. 


. Give an example of a set D C R and a continuous function f: P —> R such that 
f attains its maximum, but not a minimum, on D. 

. Let D = (0, 1}. Let f:D -> R bean increasing function on D, and let g: D — R 
be a decreasing function on D. (That is, ifx, y € Dwithx > ythen f(x) > f(s) 
and g(x) < g(y).) Then, f attains a minimum and a maximum on D (atO and 1. 
respectively), as does g (at 1 and 0, respectively). Does f + g necessarily attain 
a maximum and minimum on D? 


3 


Existence of Solutions: The Weierstrass Theorem 


We begin our study of optimization with the fundamental question of existence: 
under what conditions on the objective function f and the constraint set D are we 
guaranteed that solutions will always exist in optimization problems of the form 
max{ f(x) | x € D} or min{ f(x) | x € D}? Equivalently, under what conditions on 
f and D is it the case that the set of attainable values f(D) contaigs its supremum 
and/or infimum? 

Trivial answers to the existence question are, of course, always available: for 
instance, f is guaranteed to attain a maximum and a minimum on D if D is a finite 
set. On the other hand, our primary purpose in studying the existence issue is from the 
standpoint of applications: we would like to avoid, to the maximum extent possible, 
the need to verify existence on a case-by-case basis. In particular, when dealing with 
parametric families of optimization problems, we would like to be in a position to 
describe restrictions on parameter values under which solutions always exist. All of 
this is possible only if the identified set of conditions possesses a considerable degree 
of generality. 

The centerpiece of this chapter, the Weierstrass Theorem, describes just such a 
set of conditions. The statement of the theorem, and a discussion of its conditions, 
is the subject of Section 3.1. The use of the Weierstrass Theorem in applications is 
examined in Section 3.2. The chapter concludes with the proof of the Weierstrass 
Theorem in Section 3.3. 


3.1 The Weierstrass Theorem 


The following result, a powerful theorem credited to the mathematician Karl Weier- 
strass, is the main result of this chapter: 


Theorem 3.1 (The Weierstrass Theorem) Let D C R” be compact, and let 
f:D > R beacontinuous function on D. Then f attains a maximum and a minimum 
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on D, i.e., there exist points zy anf zý in D such that 


(zi) = f(x) > f(z), xed. 
Proof See Section 3.3. (3 


It is of the utmost importance to realize that the Weierstrass Theorem only provides 
sufficient conditions for the existence of optima. The theorem has nothing to say about 
what happens if these conditions are not met, and, indeed, in gencral, nothing can 
be said, as the following examples illustrate. In each of Examples 3.2-3.4, only a 
single condition of the Weierstrass Theorem is violated, yet maxima and minima fail 
to exist in each case. In the last example, all of the conditions of the theorem are 
violated, yet both maxima and minima exist. 


Example 3.2 Let D = R, and f(x) = x? for all x € R. Then f is continuous, 
but D is not compact (it is closed, but not bounded). Since f(D) = R, f evidently 
attains neither a maximum nor a minimum on D. m 


Example 3.3 Let D = (0, 1) and f(x) = x forall x € (0, 1). Then f is continuous, 
but D is again noncompact (this time it is bounded, but not closed). The set f(D) is 


the open interval (0, 1), so, once again, f attains neither a maximum nor a minimum 
on D. D 


Example 3.4 Let D = [—1, 1], and let f be given by 


x, if -l<x<l. 


fem |? ifx=-lorx=1 


Note that D is compact, but f fails to be continuous at just the two points — I and 1. 


In this case, f(D) is the open interval (—1, 1); consequently, f fails to attain either 
a maximum or a minimum on D. o 


Example 3.5 Let D = R44, and let f: D — R be defined by 


1, if x is rational 
[ays | 0, otherwise. 


Then D is not compact (it is neither closed nor bounded), and f is discontinuous 
at every single point in R (it “chatters” back and forth between the values 0 and 1). 


Nonetheless / attains a maximum (at every rational number) and a minimum (at 
every irratioual number). Oo 
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‘Yo restate the point: if the conditions of the Weierstrass Theorem are met, a max- 
imum and a minimum are guaranteed to exist. On the other hand, if one or more of 
the theorem’s conditions fails, maxima and minima may or may not exist, depending 
on the specific structure of the problem in question. 

As one might perhaps anticipate from Examples 3.2 through 3.5, the proof of the 
Weierstrass Theorem presented in Section 3.3 proceeds in two steps. First, itis shown 
that, under the stated conditions, f(D) must be a compact set. Since compact sets in 
R are necessarily also bounded, the unboundedness problem of Example 3.2, where 
sup f(D) = +00 and inf f(D) = —oo, cannot arise. Second, it is shown that if A 
is a compact set in R, max 4 and min 4 are always well-defined; in particular, the 
possibility of f(D) being a bounded set that fails to contain its sup and inf, as in 
Examples 3.3 and 3.4, is also precluded. Since f(D) has been shown to be compact 
in step 1, the proof is complete. 


3.2 The Weierstrass Theorem in Applications 


Perhaps the most obvious use of the Weierstrass Theorem arises jn the context of 
optimization problems in parametric form: the theorem makes it possible for us to 
identify restrictions on parameter values that will guarantee existence of solutions in 
such problems, by simply identifying the subset of the parameter space on which the 
required continuity and compactness conditions are satisfied. The following example 
illustrates this point using the framework of subsection 2.3.1. 


Example 3.6 Consider the utility maximization problem 
Maximize u(x) subject to x € B(p, D ={x e R} | p-x < J), 


where the price vector p = (p1, ..-, Pn) and income / are given. As is usual, we 
shall restrict prices and income to taking on nonnegative values, so the parameter set 
© in this problem is given by R} x Ry. 

It is assumed throughout this example that the utility function u: R} — R is 
continuous on its domain. By the Weierstrass Theorem, a solution to the utility 
maximization problem will always exist as long as the budget set B( p, /) is compact. 
We will show that compactness of this set obtains as long as p >> 0; thus, a solution 
to the utility maximization problem is guaranteed to exist for all (p, I) € ©, where 
© C @ is defined by 


© = {(p, J € 0] p> 0}. 


So suppose that (p, 7) € ©. Observe that evenif the agent spent her entire income J 
on commodity j, her consumption of this commodity cannot exceed I / pj. Therefore, 
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if we define ~ 

I I 

E = max dE 

Pi Pn 
it is the case that for any x € B(p, I), we have 0 < x < (&,...,&). It follows that 
Bp, I) is bounded. 

To see that B(p, /) is also closed, let {x*} be a sequence in B(p, /), and suppose 

xt — x. We are to show that x € B(p, 1), i.e., thatx > Oand O < p-x < /.The 
first of these inequalities is easy: since x* > 0 forall k, and xf + x, Theorem 1.8 


implies x > 0. To see thatO < p-x < /,pickany j € {1,..., n}. Since x* > x, we 
have x} — xj by Theorem 1.7. Therefore, Pix} —> pjxj foreach j, and summing 


over j, we obtain 
k ‘ k ‘ 
pox = pe > J pij = p:x. 
j=l j=l 


Since the sequence of real numbers p-x* satisfies 0 < p-x* < J for each k, another 
appeal to Theorem 1.8 yields 0 < p-x < J. Thus, x € B(p,/), and B(p, T) is 
closed. 

As a closed and bounded subset of R”, it follows that B(p, 7) is compact. Since 
p € R}, and J > 0 were arbitrary, we are done. (s 


A similar, but considerably simpler, argument works for the problem described in 
subsection 2.3.7: 


Exaraple 3.7 In subsection 2.3.7, we described how a weighted utility function can 
be used to identify a Pareto-optimal division of a given quantity of resources between 
two agents. The optimization exercise here is 


Maximize æu (x1) + (1 — @)u2(x2) subject to (x1, x2) € Flaw), 


where F(w) = {(x1, x2) € R} x R} | xi + x2 < w} represents the set of possible 
divisions, and u; is agent i’s utility function, 7 = 1, 2. The parameters of this problem 
are œ € (0, 1) and w € R7; thus, the space © of possible parameter values is given 
by (0, 1) x R}. 

It is easy to see that F (w) is compact for any w € R}. Moreover, the weighted util- 
ity function œu ı (x1) + (1 —@)u2(x2) is continuous as a function of (x4, x2) whenever 
the underlying utility functions u; and u2 are continuous on R}. Thus, provided only 
that the utility functions u; and u2 are continuous functions, the Weierstrass Theorem 
assures us. d the existence of a solution to this family of optimization problems for 
every possivle choice of parameters (a, w) € ©. (m) 


94 . Chapter 3 The Weierstrass Theorem 


Unlike these problems, natural formulations of some economic models involve 
feasible action sets that are noncompact for all feasible parameter values. At first 
blush, it might appear that the Weierstrass Theorem does not have anything to say 
in these models. However, it frequently happens that such problems may be reduced 
to equivalent ones with compact action sets, to which the Weierstrass Theorem is 
applicable. A typical example of this situation is the cost minimization problem 
outlined in subsection 2.3.4: 


Example 3.8 Recall that the cost minimization involves finding the cheapest mix of 
inputs through which a firm may produce (at least) y > 0 units of output, given the 
input price vector w € R}, and technology g: R} — R. We will suppose throughout 
this example that g is a continuous function on R}. The problem is to minimize the 
(obviously continuous) objective function w - x over the feasible action set 


Fy) = {x e R} | g(x) = y} 


This feasible set is unbounded—and therefore noncompact—for many popularly 
used forms for g, including the linear technology 


g(x) = ayxXy +--+ + anXn, aj > 0 for alli 


and members of the Cobb-Douglas family 


ga) =x exe ai > O forall i. 
Thus, despite the continuity of the objective function, a direct appeal to the Weier- 
strass Theorem is precluded. 

However, for a suitable range of parameter values, it is possible in this problem to 
“compactify” the action set, and thus to bring it within the ambit of the Weierstrass 
Theorem. Specifically, one can show that, whenever the input price vector w satisfies 
w >> 0, it is possible to define a compact subset F(y) of F(y), such that attention 
may be restricted to F(y) without any attendant loss of generality. Consider the 
following procedure. Let x € R} be any vector that satisfies g(x) > y. (We presume 
that at least one such vector exists; otherwise the set of feasible actions is empty, and 


the problem is trivial.) Let c = w- x. Define for i = 1,...,n, 
2¢ 
&=—. 
wi 


Then, the firm will never use more than &; units of input i: if it does so, then—no 
matter what the quantities of the other inputs used—this will result in total cost 
strictly exceeding č. On the other hand, the required output level can be produced 
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at a total cost of č by using the input vector £. It follows that we may, without l6ss, 
restrict attention to the set of actions 


FO) = {x ER | g(x) > y, xi < & forall i). 


This restricted set F(y) is clearly bounded. The continuity of g implies it is also 
closed. To see this, let x* be any sequence in F(y), and let xf —> x. Then, g(x*) > v 
for each k, which implies by the continuity of g that 


g(x) = lim 2(x*) > y. 
k->00 


Moreover, x* > 0 for all k implies x > 0 by Theorem 1.8. Therefore, x € FW), 
so F(y) is closed. An appeal to the Weierstrass Theorem now shows the existence 
of a solution to the problem of minimizing w - x over F(y), and, therefore, to the 
problem of minimizing w - x over F(y).! o 


Like the cost minimization problem, the expenditure minimization problem of sub- 
section 2.3.2 also involves an action space that is unbounded for most popularly used 
forms for the utility function. However, a construction similar to the one described 
here can also be employed to show that, under suitable conditions, the action space 
can be compactified, so that a solution to the expenditure minimization problem does 
exist (see the Exercises). 

Finally, the Weierstrass Theorem can also be used in conjunction with other results 
to actually identify an optimal point. We demonstrate its value in this direction in 
the succeeding chapters, where we present necessary conditions that every solution 
to an optimization problem must satisfy. By themselves, these necessary conditions 
are not enough to identify an optimum, since they are, in general, not also sufficient; 
in particular, there may exist points that are not optima that meet these conditions, 
and indeed, there may exist points meeting the necessary conditions without optima 
even existing. 

On the other hand, if the primitives of the problem also meet the compactness and 
continuity conditions of the Weierstrass Theorem, a solution must exist. By definition, 
moreover, the solution must satisfy the necessary conditions. h follows that one of the 
points satisfying the necessary conditions must, in fact, be the solution. Therefore, 
a solution can be found merely by evaluating the objective function at each point 


'For some forms of the production function g, the necessary compactification could have been achieved by 
asimpler argument, For instance, it is apparent that when g is linear, the firm will never produce an output 
level strictly greater than the minimum required level y. (Otherwise, costs could be reduced by using less 
of some inpu* and producing exactly y.) So attention may be restricted to the set {x € RI | g(x) = yy}, 
which in this ase is compact. However, the set {x € R4. | g(x) = y} is noncompact for many forms of g 
(for instance, vhen g is a member of the Cobb-Douglas family), so, unlike the method outlined in the 
text, merely restricting attention to this set will not always work. 
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that satisfies the necessary conditions, and choosing the point that maximizes (or, as 
required, minimizes) the value of the objective on this set. Since the set of points at 
which the necessary conditions are met is typically quite small, this procedure is not 
very difficult to carry out. 


3.3 A Proof of the Weierstrass Theorem 


The following lemmata, when combined, prove the Weierstrass Theorem. The first 
shows that, under the theorem’s assumptions, f(D) must be a compact set, so, in 
particular, the problem of unboundedness cannot arise. Tre second completes the 
proof by showing that if A C R is a compact set, max A and min A are both well 
defined, so the “openness” problem is also precluded. 


Lemma 3.9 If f:D — R is continuous on D, and D is compact, then f(D) is also 
compact. 


Proof Pick any sequence {yz} in f(D). The lemma will be proved if we can show 
that {yg} must have a convergent subsequence, i.e., there is a subsequence { ¥m(4)} of 
{yx} and a point y € f(D) such that ym) > y- 

For each k, pick x, € D such that f(x.) = yx. (For each k, at least one such point 
xk must exist by definition of f(D); if more than one exists, pick any one.) This 
gives us a sequence {xx} in D. Since D is compact, there is a subsequence {Xm (k) } of 
{xx}, and a point x € D such that xix) > x. 

Define y = f(x), and Yack) = J (Xm). By construction, { yn(x)} is a subsequence 
of the original sequence {yg}. Moreover, since x € D, it is the case that y = f(x) € 
f(D). Finally, since Xm(k) —> x and f is a continuous function, Ymg) = f (Xm) > 
f(x) = y, completing the proof. o 


Lemma 3.10 If A C R is compact, then sup A € A and inf A € A, so the maximum 
and minimum of A are well defined. 


Proof Since A is bounded, sup A € R. For k = 1,2,..., let Nx represent the 
interval (sup A — 1/k, sup A), and let Ag = Ng N A. Then Ax must be nonempty for 
each k. (If not, we would have an upper-bound of A that was strictly smaller than 
sup A, which is impossible.) Pick any point from Ax, and label it xx. 

We claim that x, —> sup A as k > oo. This follows simply from the observation 
that since x, € (sup A~—1/k, sup A] foreach k, it must be the case that d(x;,, sup A) < 
1/k. Therefore, d (xx, sup A) — O with k, establishing the claim. 


3.4 Exercises 97 


But x, € Ax C A for eachk, and <A is a closed set, so the limit of the sequence 
{xx} must be in A, establishing one part of the result. The other part, that inf 4 € A 
is established analogously. o 


The Weierstrass is an immediate consequence of these lemmata. Ul 


3.4 Exercises 


1. Prove the following statement or provide a counterexample: if f is a continuous 
function on a bounded (but not necessarily closed) set D, then sup f(D) is finite. 


2. Suppose D C R” is a set consisting of a finite number of points {x}, ..., Xp}. 
Show that any function f: D — R has a maximum and a minimum on D. Is this 
result implied by the Weierstrass Theorem? Explain your answer. 


3. Call a function f:IR"’ — R nondecreasing if x, y e R” with x > y implies 
f(x) > f(y). Suppose f is a nondecreasing, but not necessarily continuous, 
function on R”, and D C R” is compact. Show that if n = 1, f always has a 
maximum on D. Show also that if n > 1, this need no longer be the case. 


4, Give an example of a compact set D C R” and a continuous function f: D > R 
such that f(D) consists of precisely k points where k > 2. Is this possible if D 
is also convex? Why or why not? 


5. Let f: Ry — R be continuous on R4. Suppose that f also satisfies the conditions 
that f(0) = 1 and limz—oo f(x) = 0. Show that f must have a maximum on 
R4. What about a minimum? 


6. Let C be a compact subset of R. Let g:C > Rand f:R —> R be a continuous 
function. Does the composition f o g have a maximum on C? 


7. Let g: R — R be a function (not necessarily continuous) which has a maximum 
and minimum on R. Let f:R — R be a function which is continuous on the 
range of g. Does fo g necessarily have a maximum on R? Prove your answer, 
or provide a counterexample. 


8. Use the Weierstrass Theorem to show that a solution exists to the expenditure 
minimization problem of subsection 2.3.2, as long as the utility function u is 
continuous on 4. and the price vector p satisfies p >> 0. What if one of these 
conditions fails? 


9. Consider the profit maximization problem of subsection 2.3.3. For simplicity, 
assume that the inverse-demand curve facing the firm is constant, i.e., that p(y) = 
p > Oforall output levels y > 0. Assume also that the technology gis continuous 
on R7, 1d that the input price vector w satisfies w >> 0. Are these assumptions 
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sufficient to guarantee the existence of a solution to the profit maximization 
problem? Why or why not? 


. Show that the budget set F(p, w) = {(x, 1) € ia | p-x< w(H—l, 1 < K} 


in the consumption-leisure choice problem of subsection 2.3.5 is compact if and 
only if p > 0. 


. Consider the portfolio selection problem of subsection 2.3.6. The problem is 


said to admit arbitrage, if there is some portfolio ¢ € RY such that p-¢ <0 
and Z'ọ > 0, where p = (pi,..., pn) is the vector of security prices, and 
Z = (z;;) is the N x S matrix of security payoffs. (In words, the problem admits 
arbitrage if it is possible to create a portfolio at nonpositive cost, whose payoffs 
are nonnegative in all states, and are strictly positive in some states.) Suppose 
that the utility function U: RS — R in this problem is continuous and strictly 
increasing on RS (that is, y, y’ € RS with y > y’ implies U(y) > U(y’).) 
Show that a solution exists to the utility maximization problem if and only if 
there is no arbitrage. 


. Under what conditions (if any) on the primitives does the problem of the op- 


timal provision of public goods (see subsection 2.3.8) meet the continuity and 
compactness conditions of the Weierstrass Theorem? 


. Amonopolist faces a downward sloping inverse-demand curve p(x) that satisfies 


p(0) < œ and p(x) > 0 for all x € Ry. The cost of producing x units is given 
by c(x) > 0 where c(0) = 0. Suppose p(-) and c(-) are both continuous on 
R+. The monopolist wishes to maximize n (x) = x p(x) — c(x) subject to the 
constraint x > 0. 


(a) Suppose there is x* > 0 such that p(x*) = 0. Show that the Weierstrass 
Theorem can be used to prove the existence of a solution to this. problem. 

(b) Now suppose instead there is x’ > O such that c(x) > xp(x) for all x > 
x’, Show, once again, that the Weierstrass Theorem can be used to prove 
existence of a solution. 

(c) What about the case where p(x) = P for all x (the demand curve is infinitely 
elastic) and c(x) -> 00 as x -> 00? 


. A consumer, who lives for two periods, has the utility function v(c(1), c(2)), 


where c(t) € R}. denotes the consumer’s consumption bundle in period ¢, t = 
1, 2. The price vector in period ¢ is given by p(t) e€ R}, p(t) =(pi(t),..., pnt). 
The consumer has an initial wealth of Wo, but has no other income. Any amount 
not spent in the first period can be saved and used for the second period. Savings 
earn interest at a rate r > 0. (Thus, a dollar saved in period 1 becomes $(1 + r) 
in period 2.) 
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(a) Assume the usual nonnegativity constraints on consumption and set up the~ 
consumer’s utility-maximization problem. 

(b) Show that the feasible set in this problem is compact if and only if p(1) >> 0 
and p(2) > 0. 

. A fishery earns a profit of 2 (> ) from catching and selling x units of fish. The firm 

owns a pool which currently has y; fish in it. If x € [0, yı] fish are caught this 

period, the remaining i = yı — x fish will grow to f (i) fish by the beginning of the 

next period, where f: Ri — R+ is the growth function for the fish population. 

The fishery wishes to set the volume of its catch in each of the next three periods 

so as to Maximize the sum of its profits over this horizon. That is, it solves: 


Maximize mw (x1) + (x2) + x (x3) 


subject to x; < yı 
x25 92 = fy- x) 
x3 < y3 = f(y — x2) 


and the nonnegativity constraints that x; > 0,1 = 1, 2, 3. 

Show that if æ and f are continuous on R,, then the Weierstrass Theorem 
may be used to prove that a solution exists to this problem. (This is immediate 
if one can show that the continuity of f implies the compactness of the set of 


feasible triples (x1, x2, x3).) 


4 


Unconstrained Optima 


4.1 “Unconstrained” Optima 


We now turn to a study of optimization theory under assumptions of differentiability. 
Our principal objective here is to identify necessary conditions that the derivative of 
J must satisfy at an optimum. We begin our analysis with an examination of what 
are called “unconstrained” optima. The terminology, while standard, is somewhat 
unfortunate, since unconstrained does not literally mean the absence of constraints. 
Rather, it refers to the more general situation where the constraints have no bite at 
the optimal point, that is, a situation in which we can move (at least) a small distance 
away from the optimal point in any direction without leaving the feasible set.! 
Formally, given a set D C R”, we define the interior of D (denoted int D) by 


int D = {x € D|thereis r > 0 such that B(x,r) c D}. 


A point x where f achieves a maximum will be called an unconstrained maximum 
of f if x € int D. Unconstrained minima are defined analogously. 

One observation is important before proceeding to the analysis. The concepts of 
maxima and minima are global concepts, i.e., they involve comparisons between the 
value of f at a particular point x and all other feasible points z € D. On the other 
hand, differentiability is a local property: the derivative of f at a point x tells us 
something about the behavior of f in a neighborhood of x, but nothing at all about 
the behavior of f elsewhere on D. Intuitively, this suggests that if there exists an 
open set X C D such that x is a maximum of f on X (but maybe not on all of D), 
the behavior of f at x would be similar to the behavior of f at a point z which is an 
unconstrained maximum of f on all of D. (Compare the behavior of f at x* and y* 
in Figure 4.1.) This motivates the following definitions: 


'The term “interior optimum” is sometimes used to represent what we have called an “unconstrained 
optimum.” While this alternative term is both more descriptive and less misleading, it is not as popular, at 
least in the context of optimization theory. Consequently, we stick to “unconstrained optimum.” 
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Fig. 4.1. Global and Local Optima 


e A point x € D isa local maximum of f on D if there is r > 0 such that f(x) > 
f(y) for all ye DA B(x,r). 

è A point x € D is an unconstrained local maximum of f on D if there is r > 0 
such that B(x,r) C D, and f(x) > f(y) forall y € B(x,r). 


To maintain a distinction between the concepts, we will sometimes refer to a point 
x which is a maximum of f on all of D as a global maximum. Local and global 
minima of f on D are defined analogously. 

The next two sections classify the behavior of the derivatives of f at unconstrained 
local maxima and minima. Section 4.2 deals with “first-order” conditions, namely, the 
behavior of the first derivative Df of f around an optimal point x*, while Section 4.3 
presents “second-order” conditions, that is, those relating to the behavior of the 
second derivative D? f of f around the optimum x*. 


4.2 First-Order Conditions 


Our first result states that the derivative of f must be zero at every unconstrained 
local maximum or minimum. At an intuitive level, this result is easiest to see in the 
one-dimensional case: suppose x* were a local maximum (say), and f’(x*) # 0. 
Then, if f’(x*) > 0, it would be possible to increase the value of f by moving a 
small amount to the right of x*, while if f’(x*) < 0, the same conclusion could 
be obtained by moving a small amount to the left of x* (see Figure 4.2); at an 
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Fig. 4.2. First-Order Conditions in R 


unconstrained point x*, such moves would be feasible, and this would violate the 
definition of a local maximum. A similar argument shows that /’(x*) must be zero 
at a local minimum x*. 


Theorem 4.1 Suppose x* € int D C R” isa local maximum of f on D, i.e., there is 
r > Osuch that B(x*,r) C Dand f(x*) = f(x) forallx € B(x*, r). Suppose also 
that f is differentiable at x*. Then Df(x*) = 0. The same result is true if, instead, 
x* were a local minimum of f on D. 


Proof See Section 4.5. a 


It must be emphasized that Theorem 4.1 only provides a necessary condition for an 
unconstrained local optimum. The condition is in no way also sufficient; that is, the 
theorem does not state that if Df(x*) = 0 for some x* in the interior of the feasible 
set, then x* must be either an unconstrained local maximum or an unconstrained 
local minimum. In point of fact, it need be neither. Consider the following example: 


Example 4.2 Let D = R, and let f:R > R be given by f(x) = x? for x € R. 
Then, we have f’(0) = 0, but 0 is neither a local maximum nor a local minimum 
of f on R. For, any open ball around O must contain a point x > 0 and a point 
z < 0, and, of course, x > Oimplies f(x) = x? > 0 = (0), while z < 0 implies 
f= <0= f0). o 


In the sequel, we will refer to any point x* that satisfies the first-order condition 
Df(x*) = 0 as a critical point of f on D. To restate the point made by Theorem 4.1 
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and Example 4.2, every unconstrained local optimum must be a critical point, but 
there could exist critical points that are neither local maxima nor local minima. 


4.3 Second-Order Conditions 


The first-order conditions for unconstrained local optima do not distinguish between 
maxima and minima. To obtain such a distinction in the behavior of f at an optimum, 
we need to examine the behavior of the second derivative D? f of f. A preliminary 
definition first: 


e A local maximum x of f on D is called a strict local maximum if there isr > 0 
such that f(x) > f(y) forall-y € B(x,r) Q D, y # x. 


Strict local minima are defined analogously. The following theorem classifies the 
beha-ier of D? f at unconstrained local optima. 


Theorem 4.3 Suppose f isa C? functionon D C R”, and x isa point in the interior 
of D. 


1. If f has a local maximum at x, then D? f (x) is negative semidefinite. 

2. If f has a local minimum at x, then D? f(x) is positive semidefinite. 

3. If Df(x) = 0 and D? f(x) is negative definite at some x, then x is a strict local 
maximum of f on D. 


4. If Df(x) = 0 and D? f(x) is positive definite at some x, then x is a strict local 
minimum of f on D. 


Proof See Section 4.6. o 


Observe that unlike Theorem 4.1, the conditions of Theorem 4.3 are not all neces- 
sary conditions. In particular, while parts | and 2 of Theorem 4.3 do identify necessary 
conditions that the second derivative D? f must satisfy at local optima, parts 3 and 4 
are actually sufficient conditions that identify specific points as being local optima. 
Unfortunately, these necessary and sufficient conditions are not the same: while the 
necessary conditions pertain to semidefiniteness of D? f at the optimal point, the 
sufficient conditions require definiteness of this matrix. This is problematic from the 
point of view of applications. Suppose we have a critical point x* at which D? f(x*) 
is negative semidefinite (say), bu’ not negative definite. Then, Part 3 of Theorem 4.3 
does not allow us to conclude that x* must be a local maximum. On the other hand, 
we cannot rule out the possibility of x* being a local maximum either, because 


by Part 1 of the theorem, D? f(x*) need only be negative semidefinite at a local 
maximum x™. 
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This situation could be improved if we could either strengthen parts 1 and 2 of the 
theorem to 


D? f(x) must be negative definite at a local maximum x, and positive definite at a local 
minimum x, 


or strengthen parts 3 and 4 of the theorem to 


If Df(x) = 0 and D? f(x) is negative semidefinite (resp. positive semidefinite), then x is 
a (possibly non-strict) local maximum (resp. local minimum) of f on D. 


However, such strengthening is impossible, as the following examples demonstrate. 
The first example shows that parts 1 and 2 of the theorem cannot be improved upon, 
while the second does the same for parts 3 and 4. 


Example 4.4 Let D = R, and let f:R — R and g:R — R be defined by 
fx) = x4 and g(x) = gk respectively, for all x € R. Since x4>0 everywhere, 
and f(0) = g(0) = O, it is clear that 0 is a global minimum of f on D, and a 
global maximum of g on D. However, f”(0) = g”(0) = 0, so viewed as 1 x I 
matrices, f” (0) is positive semidefinite, but not positive definite, while g“(0) is 
negative semidefinite, but not negative definite. Oo 


Example 4.5 As is Example 4.2, let D = R, and let f: R — R be given by f(x) = 
x? for all x € R. Then, f'(x) = 3x? and f”(x) = 6x, so f’(0) = f”(0) = 0. 
Thus, £”(0), viewed as a 1 x 1 matrix, is both positive semidefinite and negative 
semidefinite (but not either positive definite or negative definite). If parts 3 and 4 of 
Theorem 4.3 only required semidefiniteness, f” (0) would pass both tests; however, 
O is neither a local maximum nor a local minimum of f on D. a 


4.4 Using the First- and Second-Order Conditions 


Taken by itself, the first-order condition of Theorem 4.1 is of limited use in computing 
solutions to optimization problems for at least two reasons. First, the condition applies 
only to the cases where the optimum occurs at an interior point of the feasible set, 
whereas in most applications, some or all of the constraints will tend to matter. 
Second, the condition is only a necessary one: as evidenced by Example 4.2, it is 
possible for a point x to meet the first-order condition without x being either a local 
maximum or a local minimum, and, indeed, without an optimum even existing. 
Even if Theorem 4.1 is combined with the second-order conditions provided by 
Theorem 4.3, a material improvement in the situation does not result. Parts 1 and 2 of 
Theorem 4.3 also provide only necessary conditions, and the function of Example 4.5 
passes both of these conditions at x = 0, although 0 is not even a local maximum 
or local minimum. While parts 3 and 4 do provide sufficient conditions, these are 
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sufficient for all local optima, and not just for global optima. Thus, the second-order 
conditions can at most help ascertain if a point is alocal optimum; they cannot help 

determine if a critical point is a global optimum or not.? In particular, global optima 

might fail to exist altogether (though several critical points, including local optima, 

may exist), but the first- or second-order conditions will not help spot this problem. 

Consider the following example: 


Example 4.6 Let D = R, and let f: D — R be given by 
f(x) = 2x3 = 3x?, 


It is easily checked that f is a C? function on R, and that there are precisely two 
points at which f'(x) = 0: namely, at x = 0 and x = 1. Invoking the second-order 
conditions, we get {”(0) = —6, while f’(1) = +6. Thus, the point 0 is a strict local 
maximum of f on D, while the point 1 is a strict local minimum of f on D. 
However, there is nothing in the first- or second-order conditions that will help 
determine whether these points are global optima. In fact, they are not: global optima 
do not exist in this problem, since lim,4400 f(x) = +00, and im, =- f(x) = 
—0o. g] 


- On the other hand, in problems in which it is known a priori that a solution’? does 
exist (say, because the problem meets the conditions of the Weierstrass Theorem), 
Theorem 4.1 can, sometimes, come in handy in computing the solution. The simplest 
case where this is true is where the problem at hand is known to have an unconstrained 
solution. In this case, the set V* of all points that meet the first-order conditions must 
contain this unconstrained optimum point (denoted, say, x*); moreover, since x* is 
an optimum on all of the feasible set, it must certainly be an optimum on N*. Thus. 
by solving for N* and finding the point that optimizes the objective function over 
N*, the solution to the problem may be identified. 

More generally, even if it is not known that the optimum must be in the interior of 
the feasible set, a modified version of this procedure can be successfully employed, 
provided the set of boundary points of the feasible set is “small” (say, is a finite 
set). To wit, the optimum must either occur in the interior of the feasible set. or 
on its boundary. Since it must satisfy the first-order conditions in the former case, 
it suffices, in order to identify the optimum, to compare the optimum value of the 
objective function on the boundary, with those points in the interior that meet the first- 
order conditions. The point that optimizes f over this restricted set is the optimum 
we are seeking. Here is a simple example of a situation where this procedure works: 


2Second-order conditions can also help to rule out candidate critical points. For instance, a critical point 
cannot be a global maximum if it passes the test for a strict local minimum. 

3A “solution,” as always, refers to a global maximum or minimum, whichever is the object of study in the 
specific problem at hand. 
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Example 4.7 Consider the problem of maximizing f(x) = 4x? — 5x? + 2x over 
the unit interval [0,1]. Since [0,1] is compact, and f is continuous on this interval, 
the Weierstrass Theorem shows that f has a maximum on this interval. There are 
two possibilities: either the maximum occurs at one of the boundary points 0 or 1, 
or it is an unconstrained maximum. In the latter case, it-must meet the first-order 
conditions: f’(x) = 12x? — 10x + 2 = 0. The only points that satisfy this condition 
are x = 1/2 and x = 1/3. Evaluating f at the four points 0, 1/3, 1/2, and 1 shows 
that x = 1 is the point where f is maximized on [0,1]. An identical procedure also 
shows that x = 0 is the point where f is minimized on [0,1]. oO 


In most economic applications, however, the set of boundary points is quite large, 
so carrying out the appropriate comparisons is a non-trivial task. For example, in the 
utility maximization problem of subsection 2.3.1, the entire line segment 


{x | p-x =I,x >0} 


is part of the boundary of the budget set B(p, 1) = (x | pex < Lx > O} 
Nonetheless, this procedure is indicative of how global optima may be identified 
by combining knowledge of the existence of a solution with first-order necessary 
conditions for local optima. We will return to this point again in the next two chapters. 

It bears stressing that the a priori knowledge of existence of an optimum is impor- 
tant if the first-order conditions (or, more generally, any set of necessary conditions) 
are to be used successfully in locating an optimum. If such knowledge is lacking, 
then, as in Example 4.6, the set of points satisfying the necessary conditions may fail 
to contain the global optimum, because no global optimum may exist. 

Finally, it must be stated that the limited importance given in this section to the 
use of second-order conditions in locating optima is deliberate. The value of second- 
order conditions is often exaggerated. As we have seen, if a solution is known to exist, 
then it may be identified simply by comparing the value of the objective function 
at the points that satisfy the first-order conditions. The only role the second-order 
conditions can play here is, perhaps, to cut down on the number of points at which 
the comparison has to be carried out; for instance, if we are seeking a maximum of f, 
then all critical points can be ruled out that also satisfy the second-order conditions 
fora strict local minimum. On the other hand, if a priori knowledge of existence of a 
solution is lacking, then as Example 4.6 shows, the most that second-order conditions 
can do is to help identify local optima, and this is evidently of limited value. 


4.5 A Proof of the First-Order Conditions 


We provide two proofs of Theorem 4.1. The first illustrates the technique of “boot- 
strapping,” a technique that we employ repeatedly in this book. That is, we first prove 
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the result for a special case, and then Use’this result to establish the general case. The - - 


second proof, which is shorter, establishes the result by appealing to properties of 
the directional derivative (see Chapter 1). 


First Proof We proceed in two steps. First, we establish the result for the case 
n = 1, i.e., when D is a subset of R. Then we use this to prove that the result holds 
for any n > 1. Our proofs only cover local maxima. The case of local minima can be 
proved analogously; the details are left to the reader. 


Casel:n =} 
Since f is assumed differentiable at x*, it is the case that for any sequence yg => x*, 
we have 
lim (2 - fx 2) = f'(x*). 
Yk — x" 

Consider two sequences {yg} and {z4} such that (a) yg < x* for all k, yk — x", 
and (b) z > x* for all k, zg = x*. For sufficiently large k, we must have yk, zp € 
B(x*, €), so for all large k, f(x*) > f(yk) and f(x*) > f(z). Therefore, for all 
large k, 


S (ve) = S) > 0 > LED- SO 
Vk — x* en zą—x* l 
Taking limits as k goes to infinity along the yg sequence establishes f’(x*) > 0, 
while taking limits along the z, sequence establishes f’(x*) < 0. These can hold 
simultaneously only if f’(x*) = 0, which establishes the desired result. 


Case 2: n >| 


Suppose f has a local maximum at x* € D C R”, and that f is differentiable at x*. 
We will show that 3f (x*)/3x; = 0 for any i. By the hypothesis that Df(x*) exists, 
we must have Df(x*} = [3f (x*)/3x1,..-, Ə f(x*)/3xn], so this will complete the 
proof. 

Fori = 1,...,n, lete; € R” be the i-th unit vector, i.e., the vector with a | in the 
i-th place and zeros elsewhere. Fix any 7, and define a function g on R by 


gt) = f(x" + tei). 
Note that 2(0) = f’(x*). Moreover, for any sequence tg — 0, we have for each k, 


gta) — g0) f(x" + hei) — f(x") 


tk tk 


and the right-hand side (RHS) of this expression converges to 3 f (x*)/ðx; ask — oo. 
Therefore, g is differentiable at 0, and g'(0) = 3f (x*)/3xi. 
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Now, d(x* + tei, x*) = ||x* + te; — x*|| = |t|, so for t sufficiently close to zero, 
we must have (x + te;) € U. Therefore, for sufficiently small ||, 


g(t) = f(x* + te) < f(x*) = g(0), 


implying that g has a local maximum at t = 0. By case 1, this means we must have 
g’(0) = 0, and the proof is complete. o 


Second Proof Once again, we prove the result for local maxima. The case of local 
minima may be handled analogously. So suppose x* is a local maximum of f on D, 
i.e., there isr > O such that f(x*) > f(y) forall y € B(x*, r). Suppose also that f 
is differentiable at x*. 

Recall from Chapter 1 that the differentiability of f at x* implies that the (one- 
sided) directional derivative Df(x*; k) exists for any h e R”, with, in fact, 
Df (x*; h) = Df(x*) -h (see Theorem 1.56). We will show that Df(x*; h) = 0 for 
all h € R”. This will establish the desired result since Df(x*) -h = O can hold for 
all A only if Df(x*) = 0. 

First, we claim that Df(x*; h) < O for any h € R”. Suppose this were not true 
for some h, so Df(x*; h) > 0. From the definition of Df(x*; hf, we now have 
Sat + th) — fet) - 

t 
Therefore, forallt > O sufficiently small, itis the case that (f(x* +th) — f(x*)) > 0. 
However, fort < r/|A|], we also have d(x + th, x) = tllhl| < r, so (x + th) € 
B(x*,r). Since x is a maximum on B(x*, r), this implies f(x* + th) < f(x*), a 
contradiction which establishes the claim. 

Now, pick any h € R”, and let hy = h, hp = —h. Since hi, h2 € R”, we 
must then have Df (x*; hy) = Df(x*) -hı = Df(x*)-h < 0, and Df(x*; h2) = 
Df(x*) -h2 = —Df(x*) -h < 0. But Df(x*)-h < O and —Df(x*) -h < Oare 
possible simultaneously only if Df(x*; h) = 0. Since A was chosen arbitrarily, we 
have Df(x*) +h = 0 forall h. Q 


Df(x*;h) = im 0. 


4.6 A Proof of the Second-Order Conditions 
As with the first proof of Theorem 4.1, we adopt a two-step procedure to prove 
Theorem 4.3. We first prove the result for the case where n = | (i.e., D C R), and 
then using this result to prove the general case. 


Case l:n =} 
When n = 1, the second-derivative f(x) of f at x is simply a real number. There- 
fore, Parts 1 and 2 of the theorem will be proved if we prove Parts 3 and 4. For, suppose 
Parts 3 and 4 are true. Suppose further that x is a local maximum, so f'(x) = 0. If 
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f" (x) £0, we must have f” (x) > 0, But this will imply by Part 4 that x is actually a_ 
strict local minimum, which is a contradiction because a strict local minimum cannot 
also be a local maximum. 

We prove Part 3 here. Part 4 may be proved by a completely analogous procedure. 
Suppose x is in the interior of D and satisfies f'(x) = O and f”(x) < 0. Since 
f(x) < 0, the first derivative f’ must be strictly decreasing in a neighborhood 
(denoted, say, B(x, r)) of x. Since f'(x) = 0, a point z € B(x,r) must satisfy 
f'(z) > Oif z < x, and f(z) < Oif z > x. In tum, these signs of f’ mean that f 
is strictly increasing to the left of x (i.e., f(x) > f(z) ifz € Blx.r),2 < x) and 
strictly decreasing to the right of x (ie, f(x) > f(z) ifz€ B(x,r) z > x). This 
states precisely that x is a strict local maximum. 


Case 2:n > | 


When n > 1, Parts 3 and 4 are no longer the contrapositives of Parts 1 and 2, 
respectively, since the failure ofan » x n matrix to be positive definite does not make 
it negative semidefinite. So a proof of Parts 1 and 2 will not suffice to establish Parts 3 
and 4, and vice versa. 

We prove Part | first. Let x be an unconstrained local maximum of f on D. We 
have to show that for any z in R”, z #0, we have z’ D? f(x)z > 0. 

Pick any z in R”. Define the real-valued function g by g(t) = f(x + tz). Note 
that (0) = f(x). For |r| sufficiently small, (x + ¢z) will belong to the neighborhood 
of x on which x is a local maximum of f. It follows that there is some € > 0 such 
that g(0) > g(t) for all £ € (—€, €), i.e., that 0 is a local maximum of g. By case 1, 
therefore, we must have g” (0) < 0. On the other hand, it follows from the definition 
of g that g”(t) = z’ D? f(x +t2z)z, so that z’ D? f(x)z = g"(0) < 0, as desired. This 
proves Part 1. Part 2 is proved similarly and is left to the reader. 

To see Part 3, suppose D/(x) = 0 and D? f(x) is negative definite. Since f is 
C2, by Corollary 1.60, there exists € > O such that D? f(w) is negative definite for 
all w € B(x, €). Fix e. We will show that f(x) > f(w) forall w € B(x,6),w# x. 

Let S = {z € R” | |\z| = 1}. Define B= {y e R" | y=x4+rz, ze S, i| <€}. 
We claim that B = B(x, €). 

To see this, first note that for any y = x + tz € B, d(x, y) = |x +z- x] = 
itl izi] = It] < €,so certainly B C B(x, €). On the other hand, pick any w € B(x. e). 
Define ¢ = d(x, w) = IIx — wll, and z = (w — x)/t. Then, since w € B(x, €), we 
have O <¢ < €, and of course, |z| = w — xl/ļļw — xil = 1,soz € S. By 
construction, (z = w — x or x + tz = w. This establishes B(x, €) C B. Therefore, 
B(x, €) = B, as claimed. 

We will prove that x is a strict local maximum of f on B. Fix any z € S. Let 
g(t) = f(x + tz),t e (—€,€). Since (x + tz) € B = B(x,«), it is the case 
that D? f(x + tz) is negative definite for |t| < €. The usual argument shows that 
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g'(0) = 0. Moreover, g” (t) = z’ D*( f(x+tz)z, which is strictly negative for |r| < €. 
This implies that the point 0 is a strict maximum of g on the interval |t| < €. In turn, 
this means that f(x) > f(x + tz) forall |t] < €. 

Since z was arbitrary, this inequality holds for any z € S on W| < e. This states 
precisely that f(x) > f(w) for all w € B(x, €), proving Part 3. Part 4 is proved 
analogously. QO 


4.7 Exercises 


1. Is Theorem 4.1 valid if x* ¢ int D? If yes, provide a proof. If not, a counterex- 
ample. 

2. Find all the critical points (i.e., points where f'(x) = 0) of the function f: R —> 
R, defined as f(x) = x — x? ~ x3 for x € R. Which of these points can you 
identify as local maxima or minima using the second-order test? Are any of these 
global optima? 

3. Suppose x € int D is a local minimum of f on D. Suppose also that f is 
differentiable at x. Prove, without appealing to Theorem 4.1, that Df(x) = 0. 

4. Find and classify the critical points (local maximum, local minimum, neither) of 
each of the following functions. Are any of the local optima also global optima? 
(a) f(x, y) = 2x3 + xy? + Sx? + y?. 

(b) f(x, y) =e (x +y? +2y). 
(c) f(x, y) = xyla =x — y). 
(d) f(x, y) = xsin y. 

© f(x, y) =x +y y. 
(f) f(x,y) = x4 Er — x3, 

(g) f@, y) = Tate 


(h) f(x, y) = (x4/32) +x? y? -x — y’. 
5. Find the maximum and minimum values of 
f(x,y) =2+2x+2y- x? - y? 
on the set f(x, y) € RZ | x + y = 9} by representing the problem as an 
unconstrained optimization problem in one variable. 


6. Let f:R > R have a local maximum at x*. Show that the following inequalities 
(called the “generalized first-order conditions”) hold regardless of whether f is 
assumed differentiable or not: 


lim inf (a) >0, and limsup (A) <0. 


ytx* x*— y y4x* x*—y 


4.7 Exercises ; Vil 


(The expression y f x is-shorthand for y < x and y —> x, while y |-x is 
shorthand for y > x and y > x.) Explain whether the “lim sup” and “lim inf” 
in these inequalities can be replaced with a plain “limit.” Examine also whether 
the weak inequalities can be replaced with strict ones, if x* is instead a strict 
local maximum. 


. Suppose f: R — R has a local maximum at x that is nota strict local maximum. 


Does this imply that f is constant in some neighborhood of x? Prove your answer 
or provide a counterexample. 


. Let f:R4 — R4 bea C! function that satisfies f() =Oandlimy soo f(x) = 


0. Suppose there is only a single point x € R4 at which f’(x) = 0. Show that 
x must be a global maximum of f on R. 


. Let g:R — R be a strictly increasing C? function. Let D be an open set in R” 


and let f: D —> R also be a C? function. Finally, suppose that f has a strict local 
maximum at x* € D, and that D? f(x*) is negative definite. Use Theorem 4.3 
to show that the composition g o f also has a strict local maximum at x*. 


. Let f:R” —> R be aC! function, and let g = — f. Fix any x, and show that 


the quadratic form D*g(x) is positive definite if and only if D? f(x) is negative 
definite. 


5 


Equality Constraints and the Theorem of Lagrange 


It is not often that optimization problems have unconstrained solutions. Typically, 
some or all of the constraints will matter. Over this chapter and the next, we examine 
necessary conditions for optima in such a context. 

If the constraints do bite at an optimum x, it is imperative, in order to characterize 
the behavior of the objective function f around x, to have some knowledge of what 
the constraint set looks like in a neighborhood of x. Thus, a first 8tep in the analysis 
of constrained optimization problems is to require some additional structure of the 
constraint set D, beyond just that it be some subset of R”. The structure that we 
shall require, and the order in which our analysis will proceed, is the subject of the 
following section. 


5.1 Constrained Optimization Problems 
It is assumed in the sequel that the constraint set D has the form 


D = UN{x ER" | g(x) =0, A(x) > 0}, 


where U C R" is open, g: R” > R‘, anda: R” > R'. We will refer to the functions 
g = (g1,-.., Bk) as equality constraints, and to the functions h = (hi, ..., hi) as 
inequality constraints.' 

This specification for the constraint set D is very general, much more so than might 
appear at first sight. Many problems of interest in economic theory can be written 
in this form, including all of the examples outlined in Section 2.3. Nonnegativity 
constraints are, for instance, easily handled: if a problem requires that x € R”, this 
may be accomplished by defining the functions Aj: R” + R 


Aj) = Xj, Jh 


‘Note that we do not preclude the possibility that U could simply be all of R”, that is, that D can be expressed 
using only the inequality and equality constraints. 
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and using the n inequality constraints 
hj{x) = 0, J Spice 


More generally, requirements of the form a(x) > a, B(x) < b, or w(x) = ¢ 
(where a, b, and c are constants), can all be expressed in the desired form by simply 
writing them as a(x) — a > 0, b — B(x) > 0, ore — w(x) = 0. 

Thus for instance, the budget set B(p, J) = {x € R} | p- x < /} of the utility 
maximization problem of subsection 2.3.1 can be represented using (n + 1) inequality 
constraints. Indeed, if we define h;(x) = x; for j = 1,...,n, and 


Angi(x) = [- p-x, 
then it follows that B(p, /) is the set 
{x ER" [Aj(x) > 0, f=1,..., n+}. 


As mentioned above, our analysis of constrained optimization problems in this 
book comes in three parts. In this chapter, we study the case where all the constraints 
are equality constraints, i.e., where the constraint set D can be represented as 


D = UN{x | g(x) =O}, 


where U C R” is open, and g: R” —> R*. In the sequel, we shall refer to these as 
equality-constrained optimization problems. 

Then, in Chapter 6, we study the complementary case where all the constraints 
are inequality constraints, i.e., the constraint set has the form 


D = UN{x| h(x) = 0}, 


where U c R’ is open, and h:R” —> R!. We label these imequality-constrained 
optimization problems. Finally, in Section 6.4, we combine these results into the 
general case of mixed constraints, where the specification of D may involve both 
equality and inequality constraints: 


D = UN {x ER" | g(x) =0, A(x) > 0}. 


5.2 Equality Constraints and the Theorem of Lagrange 


The Theorem of Lagrange provides a powerful characterization of local optima of 
equality-constrained optimization problems in terms of the behavior of the objective 
function f and the constraint functions g at these points. The conditions the theorem 
describes may be viewed as the first-order necessary conditions for local optima 
in these problems. The statement of the theorem, and a discussion of some of its 
components, is the subject of this section. 
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5.2.1 Statement of the Theorem 


To state the Theorem of Lagrange, it is necessary to recall two pieces of notation 
from Chapter |. First, that p(4) denotes the rank of a matrix A; and, second, that 
the derivative D&(x) of a function £: R” > R? is the p x n matrix whose (i, /)-th 
entry is 

a 

BEL GA. PA ace. PA oe 

ax; 
Theorem 5.1 (The Theorem of Lagrange) Let f: R” — R, and gi: R” —> R* be 
C! functions, i = 1, ... , k. Suppose x* is a local maximum or minimum of f on the 
set 


D = UN {x | g(x) =0, i=1,..., k), 


where U C R" is open. Suppose also that p(Dg(x*)) = k. Then, there exists a 
vector A* = (AT, ..., Af) € RÉ such that 


; l 
Df(x*) +Y A Dgi(x*) = 0. 


i=l 
Proof See Section 5.6 below. o 


Remark In the sequel, if a pair (x*, A*) satisfies the twin conditions that g(x*) = 0 
and Df(x*) + Eé AF Dgi(x*) = 0, we will say that (x*,A*) meets the first- 
order conditions of the Theorem of Lagrange, or that (x*, A*) meets the first-order 
necessary conditions in equality-constrained optimization problems. o 


It must be stressed that the Theorem of Lagrange only provides necessary condi- 
tions for local optima x*, and, at that, only for those local optima x* which also meet 
the condition that p(Dg(x*)) = k. These conditions are not asserted to be sufficient; 
that is, the theorem does not claim that if there exist (x, A) such that g(x) = 0, and 
Df(x) + ae A; Dg;(x) = 0, then x must either be a local maximum or a local 
minimum, even if x also meets the rank condition p(Dg(x)) = k. Indeed, it is an 
easy matter to modify Example 4.2 slightly to show that these conditions cannot be 
sufficient: 


Example 5.2 Let f and g be functions on R? defined by f(x, y) = x3 + y3 
and g(x, y) = x — y. Consider the equality-constrained optimization problem of 
maximizing and minimizing f(x, y) over the set D = {(x, y) € R? | g(x, y) = 0}. 

Let (x*, y*) be the point (0, 0), and let A* = 0. Then, g(x*, y*) = 0, so (x*, y*) 
is a feasible point. Moreover, since Dg(x, y) = (1, —1) for any (x, y), it is clearly 
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the case that o( Dg(x"*, y*)) =t. Finally, since Df (x, y) = Bx", 37): we haye 
Df(x*, y*) +A* Dg(x*, y*) = (0,0) +0-(1,—-1) = (0, 0). 


Thus, if the conditions of the Theorem of Lagrange were also sufficient, then (x*, vt) 
would be either a local maximum or a local minimum of f on D. Itis, quite evidently, 
neither: we have f(x*, y*) = 0, but for every € > 0, it is the case that (—e, ~e) € D 
and (€, €) € D, so f(—e€, -€) = —2e? <0= S(x*, y*), while f(e,€) = 2e? > 
O= f(x*, y*). 3 


In the subsections that follow, we focus on two aspects of the Theorem of Lagrange. 
Subsection 5.2.2 examines the importance of the rank condition that p( Dg(x*)) = k. 
while subsection 5.2.3 sketches an interpretation of the vector A* = (Aq 27) 
whose existence the theorem asserts. 


5.2.2 The Constraint Qualification 


The condition in the Theorem of Lagrange that the rank of Dg(x*) be equal to the 
number of constraints k is called the constraint qualification under equality con- 
straints. It plays a central role in the proof of the theorem; essentially, it ensures that 
-Dg(x*) contains an invertible k x k submatrix, which may be used to define the 
vector A* (for details, see Section 5.6 below). 

More importantly, it turns out to be the case that if the constraint qualification is 
violated, then the conclusions of the theorem may also fail. That is, if x* is a local 
optimum at which p(Dg(x*)) < k, then there need not exist a vector 4* such that 
Df(x*)+ ee AF Dgi(x*) = 0. The following example, which involves seemingly 
well-behaved objective and constraint functions, illustrates this point: 


Example 5.3 Let f:R? —> R and g: R? > R be given by f(x,y) = —y, and 


a(x, y) = y ~ x’, respectively. Consider the equality-constrained optimization 
problem i 


Maximize f(x, y) subject to (x, y) E D = ((x', y) e R? | g(x’, y) = 0). 


Since x? > 0 for any real number x, and the constraint requires that 3? = x?, we 
must have y > 0 for any (x, y) € D; moreover, y = 0 if and only if x = 0. It 
easily follows that f attains a unique global maximum on D at the origin (x, y) = 
(0,0). At this global—and, therefore, also local—maximum, De(x, y) = (0,0), 
so p(Dg(0, 0)) = 0 < 1. Thus, the constraint qualification is violated. Moreover, 
Df (x, y) = (0, —1) at any (x, y), which means that there cannot exist any A € R 


such that Df(0, 0) + ADg(0, 0) = (0, 0). Thus, the conclusions of the Theorem of 
Lagrange also fail. D 
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5.2.3 The Lagrangean Multipliers 


The vector A* = (A7, ...,A{) described in the Theorem of Lagrange is called the 
vector of Lagrangean multipliers corresponding to the local optimum x*. The i-th 
multiplier A} measures, in a sense, the sensitivity of the value of the objective function 
at x* toa small relaxation of the i-th constraint g;. We demonstrate this interpretation 
of 4* under some assumptions designed to simplify the exposition. 

We begin with a clarification of the notion of the “relaxation” of a constraint. To 
this end, we will suppose in the rest of this subsection that the constraint functions g 
are given in parametric form as 


g(x: c) = g(x) +c, 


where c = (c},..., Cx) iS a vector of constants. This assumption enables a formal 
definition of the concept we need: a relaxation of the i-th constraint may now be 
thought of as an increase in the value of the constant c;. 

Now, let C be some open set of feasible values of c. Suppose that for each c € C, 
there is a global optimum,” denoted x*(c), of f on the constraint set 


a 


D = UN{x e R”) g(x; c) = O}. 


Suppose further that, for each c € C, the constraint qualification holds at x*(c), so 
there exists A*(c) € RÉ such that 


k 
Df(x*(c)) + SOAP) Dgi@*(c)) = 0. 
i=] 


Finally, suppose that x*(-) is a differentiable function on C, that is, that the optimum 
changes smoothly with changes in the underlying parameters. Let 


F(c) = f(x") 


be the value of the objective function at the optimum given the parameter c. Since f 
is C! and x*(-) has been assumed to be differentiable on C, F(-) is also differentiable 
on C. We will show that 


DF(c) = A*(c), 
that is, that 3 F(c)/8c; = A} (c). In words, this states precisely that Aj (c) represents 
the sensitivity of the objective function at x*(c) to a small relaxation in the i-th 


constraint. 
First, note that from the definition of F(.), we have 


DF(c) = Df(x*(c))Dx*(), 


*Everything in the sequel remains valid if x*(c) is just a local maximum of f on D. 
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where Dx*(c) is the n x k matrix whose (i, j)-th entry is dx7(c)/dc,;. By theirst- 
order conditions at each c, we also have 


k 
Df (x*(c)) = = DIAC) Dga (e). 
i=} 
By combining the last two expressions, we obtain 


k k : 
DF(c) = -(Lrongero] Dx*(c) = — JOANO Dgi(x*(c)) Dx*(e). 


i=] i=] 


Since x*(c) must be feasible at all c, it must identically be true for each į and for 
all c that g;(x*(c)) + ci = 0. Differentiating with respect to c, and rearranging, we 
obtain 


Dgi(x*(c)) Dx*(c) = —ei, 


where e; is the i-th unit vector in R4, that is, the vector that has a | in the i-th place 
and zeros elsewhere. It follows that 


k 
DF(c) = -YOA oei) = Ato), 
i=} 
and the proof is complete. 

An economic interpretation of the result we have just derived bears mention. Since 
AF(c) = əf (x*(c))/ðxi, a small relaxation in constraint i will raise the maximized 
value of the objective function by A?(c). Therefore, A7(c) also represents the maxi- 
mum amount the decision-maker will be willing to pay for a marginal relaxation of 
constraint 7, and is the marginal value or the “shadow price” of constraint į atc. 


5.3 Second-Order Conditions 


The Theorem of Lagrange gives us the first-order necessary conditions for optimiza- 
tion problems with equality constraints. In this section, we describe second-order 
scnaitions for such problems. As in the unconstrained case, these conditions pertain 
only to local maxima and minima, since differentiability is only a local property. 

So consider the problem of maximizing or minimizing f: R” — R over the set 
D =U N {x | g(x) = 0}, where g: R” — RY, and U C R’ is open. We will assume 
in this subsection that f and g are both C? functions. The following notation will 
come in handy: given any à € IR, define the function L on R” by 


k 
LŒ) = f(x) + $ Aigi). 
i=! 
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Note that the second derivative D? L(x; à) of L(-; A) with respect to the x-variables 
is the n x n matrix defined by 


k 
DL (x;A) = D? f(x) + DAD? gl). 
i=l 
Since f and g are both C? functions of x, sois L(-; À) for any given value of A € R$. 
Thus, D?L(x; à) isa symmetric matrix and defines a quadratic form on R”. 


Theorem 5.4 Suppose there exist pointsx* € Dandh* € R* such that p(Dg(x*)) = 
k, and Df (x*) + DA AF Dgi (x*) = 0. Define 


Z(x*) = {z € R” | Dg(x*)z = 0}, 


and let D? L* denote the n xn matrix D?L(x*; 4*) = D? f(x*) +5% rT D? gi(x*). 


1. If f has a local maximum on D at x*, then z' D? L*z < 0 for all z € Z(x*). 

2. If f has a local minimum on D at x*, then z'D?L*z > 0 forall z € Z(x*). 

3. Ifz'D?L*z < 0 for all z € Z(x*) with z +Æ 0, then x* is a stxict local maximum 
of f onD. 

4. Ifz'D?L*z > 0 forall z € Z(x*) with z # 0, then x* is a strict local minimum 
of f onD. 


Proof See Section 5.7 below. ` o 


There are obvious similarities between this result and the corresponding one for 
unconstrained maximization problems (Theorem 4.3). As there, parts I and 2 of 
this theorem are necessary conditions that must be satisfied by all local maxima 
and minima, respectively, while parts 3 and 4 are sufficient conditions that identify 
specific points as being local maxima or minima. 

There are also two very important differences. First, the second-order conditions 
are not stated in terms of only the second derivative D? S(x*) of f at x*. Rather, 
we add the “correction term” Sy A* D? g; (x*), and state these second-order condi- 
tions in terms of D? L(x*; A") = D? f(x") + paar ar D? gi(x*). Second, the stated 
properties of the quadratic form D? L(x*; A*) do not have to hold on all of R”, but 
only on the subset of R” defined by Z(x*). 

The last observation motivates our next result. In Chapter 1, we have seen that the 
definiteness of a symmetric n x n matrix A can be completely characterized in terms 
of the submatrices of A. We now turn to a related question: the characterization 
of the definiteness of A on only the set {z # 0 | Bz = 0}, where Bisakxn 
matrix of rank k. Such a characterization would give us an alternative way to check 
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for the conditions of Theorem 5.4 by simply using the substitution 4 = D* L* and 
B = Dg(x"*). 

Some notation first. For / € {1,...,n}, let A; denote the / x ? submatrix of A 
obtained by retaining only the first / rows and columns of 4: 


Qype. ay 


ay wee AL] 


Similarly, for / € {1,...n}, let By; denote the k x / matrix obtained from B by 
retaining only the first / columns of B: 


by wee bu 
By = % 3 
bey ... bk 
When k = /, we shall denote By; simply by Bx. 


Next, given any permutation xr of the first n integers,? let A” denote the n x n 


symmetric matrix obtained from A by applying the permutation 7 to both its rows 
and columns, 


Amim > DF, Anin 
A = 
Annn >t Anan 


and let B7 denote the k x n matrix obtained by applying the permutation z to only 
the columns of B: 


bini PAS Dis, 
B = : : 


bkan © bim, 


In an obvious extension of this notation, A7 will be the / x / submatrix obtained 
from A” by retaining only the first / rows and / columns of A”, and Bf, will denote 
the k x / submatrix of B” obtained by retaining only the first / columns of B8”. 

Finally, given any / € {1,... n} Jet C; be the (4 +/) x (k +1) matrix obtained 
by “bordering” the submatrix A; by the submatrix By in the following manner: 


_ | 0% Bu 
Cı wan B! A ’ 
kl l 
3Recall from Chapter 1 that a permutation sr of the integers {1,..., n} is simply a reordering of the integers. 


The notation zr, is used to denote the new integer in the position k under the ordering 7. Thus, for instance. 
if we represent the permutation (2, 1, 3) of the set (1, 2, 3} by 7, we have x; = 2, m = 1, and 73 = 3. 
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where Ox is a null k x k matrix, and By is the transpose of B,;. In full notation, we 
have 


0 eee 0 bu ... bu 
0 0 bki bki 
Cr = 
bi bki ain ay 
bu ... buy an wee aq 


Denote by C7 the matrix obtained similarly when A is replaced by A” and B by B”. 

The following result shows that the behavior of the quadratic form x’ Ax on the 
set {z Æ 0 | Bz = 0} can, in turn, be completely characterized by the behavior of the 
bordered matrices C;. For this result, we will assume that |B,| # 0. Since we have 
assumed that p(B) = k, the matrix B must contain some k x k submatrix whose 
determinant is nonzero; an assumption that this submatrix is the one consisting of 
the first k rows and columns involves no loss of generality. 


Theorem 5.5 Let A be a symmetric n x n matrix, and B ak x n matrix such that 
|By| Æ 0. Define the bordered matrices C; as described above. Then, 


1. x'Ax > 0 for every x such that Bx = 0 if and only if for all permutations x of 
the first n integers, and for allr €{k+.1,...,n}, we have (=DŚŽIC7] >0. 

2. x'Ax < 0 forall x such that Bx =0 if and only if for all permutations n of the 
first n integers, and forall r e {k+ 1,...,n}, we have (—1)"|C7| > 0. 

3. x Ax > Oforall x #0 such that Bx = Oifand only ifforallr € {k+1,...,n}, 
we have (—1)*|C,| > 0. 

4. x'Ax < Oforall x + 0 such that Bx = O ifand only if for allr e {k+1,...,7}, 
we have (—1)"|C,| > 0. 


Proof See Debreu (1952, Theorem 4, p.297, Theorem 5, p.298, and Theorems 9 
and 10, p.299). oO 


Note the important difference between parts 1 and 3 of Theorem 5.5 on the one 
hand, and parts 2 and 4 on the other. In parts | and 3, the term —} is raised to the fixed 
power k, so that the signs of the determinants |C,| are required to all be the same; in 
parts 2 and 4, the term — 1 is raised to the power r, so the signs of the determinants 
|C, | are required to alternate. 

When A = D?L* and B = Dg(x*), the matrices C, are called “bordered hes- 
sians.” This term arises from the fact that C, is constructed by bordering a r x r 
submatrix of the hessian D? L*, with terms obtained from the matrix Dg(x*).4 


4Recall from Chapter 1 that the second derivative D?h of a C? function A is called the hessian of h. 
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The use of the second-order cofiditions in applications is illustrated in SectiOn 3:5 
below, where we work out examples based on economic problems. Subsection 5.5.1 
illustrates the use of Theorem 5.4 in the context of a utility maximization prob- 
lem, while subsection 5.5.2 does likewise for Theorem 5.5 in the context of a cost 
minimization problem. 


5.4 Using the Theorem of Lagrange 
5.4.1 A “Cookbook” Procedure 


Let an equality-constrained optimization problem of the form 
Maximize f(x) subject to x € D = UN {x | g(x) = 0) 


be given, where f: R” —> R, and g: R” — Ré are C! functions, and U c R” is 
open. We describe here a “cookbook” procedure for using the Theorem of Lagrange to 
solve this maximization problem. This procedure, which we shall call the Lagrangean 
method, involves three steps. 

In the first step, we set up a function L:D x R* —> R, called the Lagrangean, 
defined by: 


k 
L(x, à) = f(x) + Pigi (x). 
i=l 
The vector A = (À},..., Àk) € IRF is called the vector of Lagrange multipliers. 

As the second step in the procedure, we find the set of all critical points of L(x. A) 
for which x € U, i.e., all points (x, A) at which DL(x,A) = O and x € U. Since 
x € R” and A € R4, the condition that DL(x, A) = 0 results in a system of (n + k) 
equations in the (n + k) unknowns: 


Let M be the set of all solutions to these equations for which x e U: 
M = {(x,A) |x € U, DL(x, à) = 0}. 
As the third and last step in the procedure, we evaluate f at each point x in the set 
{x € R” | there is A such that (x, A) € M}. 


In practice, the values of x which maximize f over this set are also usually the 
solutions to the equality-constrained maximization problem we started with. 
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The steps to be followed for a minimization problem are identical to the ones for 
a maximization problem, with the sole difference that at the last step, we select the 
value of x that minimizes f over the set {x | there is A such that (x, A) € M}. 


5.4.2 Why the Procedure Usually Works 


It is important to understand why the Lagrangean method typically succeeds in 
identifying the desired optima, and equally importantly, when it may fail. The key to 
both questions lies in the following property of the set of critical points of L: 


The set of all critical points of L contains the set of all local maxima and minima of f on D 
at which the constraint qualification is met. That is, if £ is any local maximum or minimum 
of f on D, and if the constraint qualification holds at £, then there exists Â such that (£, A) 
is a critical point of L. 


Indeed, this is an immediate consequence of the definition of L. For (x, A) to be a 
critical point of L, we must have 


aL 
ae = gi) = 


fori =1,...,k, as well as 
aL a 
<—(x,A) = ne y+ oa si (8) = 0 
ox; 
fori =1,...,n. Thus, (x, A) is a critical point of L if and only if it meets the first- 


order conditions of the Theorem of Lagrange, i.e., it satisfies both g(x) = 0 as well 
as Df (x) + a à; Dgi(x) = 0 

Now, suppose £ is a local optimum, and the constraint qualification holds at £. 
Then, we must certainly have g) = O, since ¥ must be feasible. By the Theorem 
of Lagrange, there also exists A such that Df) + ye 1 Îi Dg; (å) = 0. This states 
precisely that (%, 1) must be a critical point of L, and establishes the claimed property. 

A particular implication of this property is the following, which we state in the 
form of a proposition for ease of future reference: 


Proposition 5.6 Suppose the following two conditions hold: 


1. A global optimum x* (i.e., a global maximum or, as required, a global minimum) 
exists to the given problem. 


2. The constraint qualification is met at x*. 


Then, there exists X* such that (x*, *) is a critical point of L. 
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It follows that under the two conditions of this proposition, the Lagrangean method 
will succeed in identifying the optimum x”. Indirectly, the conditions of this propo- 
sition also explain why the Lagrangean method usually works in practice. In most 
applications, the existence of a solution is not a problem, and neither typically is the 
constraint qualification. In particular, it is often possible to verify existence a priori, 
say, by an appeal to the Weierstrass Theorem. Although it is not, in general, possible 
to do likewise for the constraint qualification (since this depends on the properties 
of Dg at the unknown optimal point x*), it is quite often the case that the constraint 
qualification holds everywhere on the feasible set D, so this is not a problem either. 
(The utility-maximization problem is a typical case in point. See subsection 5.5.1 
below.) 


5.4.3 When It Could Fail 
Unfortunately, when the conditions of Proposition 5.6 fail to hold, the procedure 
could also fail to identify global optima. This subsection provides a number of ex- 
amples to illustrate this point. 

First, if an optimum exists but the constraint qualification is not met at the optimum, 
then the optimum need not appear as part of a critical point of L. Of course, this 
does not imply that L will have no critical points; the conditions of the Theorem 
of Lagrange are only necessary, so many critical points could exist. The following 
examples illustrate these points. In each example, a unique global maximum exists, 
and in each case the constraint qualification fails at this optimum. In the first example, 
this results in a situation where there are no solutions at all to the equations that define 
the critical points of L; in the second example, on the other hand, multiple solutions 
exist to these equations, but the problem’s unique maximum is not one of these. 


Example 5.7 As in Example 5.3, let f and g be functions on R? given by f(x, y) = 
—y and g(x, y) = yp- x?, respectively. We saw earlier that the unique global 
maximum of f on D = {(x, y) | g(x, y) = 0) is at (x, y) = (0,0), but that the 
constraint qualification fails at this point; and, consequently, that there is no A such 
that Df(0, 0) + ADg(O, 0) = 0. Evidently, then, there is no choice of A for which 


the point (0, O, à) turns up as a solution to the critical points of the Lagrangean L in 
this problem. 


Indeed, the critical points of L in this problem admit no solution at all: if we define 
L(x, yA) = f(x, y) + Agta, y) = —y + A(y? — x?), the critical points of L are 
the solutions (x, y, A) to the following system of equations: 

—2rAx = 0 
~1+3ay* = 0 
-x7 + y = 0. 
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For the first equation to be satisfied, we must have x = O or à = 0.1f A = 0, the 
second equation cannot be satisfied. If x = 0, the third equation implies y = 0, and 
once again the second equation cannot be satisfied. QO 


Example 5.8 Let f and g be functions on R? defined by f(x, y) = 2x? — 3x2, and 
g(x, y) = (3 — x}? — y?, respectively. Consider the problem of maximizing f over 
the set D = {(x, y) € R? | g(x, y) = 0}. 

Since the constraint requires that (3 — x)? = y*, and since y? > 0, it is easy to 
see that the largest value of x on the feasible set is x = 3 which occurs at y = 0. 
A little calculation also shows that f is nonpositive for x in the interval (—oo, 3), 
and is strictly positive and strictly increasing for x > 35 It follows from these 
statements that f attains a global maximum on D at the point (x, y) = (3,0). 
Note that since Dg(x, y) = (~3(3 — x)”, —2y), we have Dg(3, 0) = (0, 0), so the 
constraint qualification fails at this unique global maximum. We will show that, as a 
consequence, the cookbook procedure will fail to identify this point. 

The Lagrangean Z for this problem has the form L(x, y,à) = 2x3 — 3x? + 
4((3 — x)? ~ y?), so the critical points of L are the solutions (x, yA) to 


6x? — 6x — 3413 — x)? = 0 
—2hy = 0 
3-—x)—y? =0. 


For the second condition to be met, we must have A = Q or y = 0. If y = 0, then the 
third condition implies x = 3, but x = 3 violates the first condition. This leaves the 
case A = 0, and itis easily checked that there are precisely two solutions that arise in 
this case, namely, (x, y, à) = (0, 27, 0) and (x, y, A) = (1, V8, 0). In particular, 
the unique global maximum of the problem (x, y) = (3, 0) does not turn up as part 
of a solution to the critical points of L. Oo 


Alternatively, even if the constraint qualification holds everywhere on the con- 
straint set D, the cookbook procedure may fail to identify the optimum, for the 
simple reason that an optimum may not exist. In this case, the optimum evidently 
cannot turn up as part of a critical point of L, although, once again, L may have many 
critical points. The following examples illustrate this point. In both examples, global 
optima do not exist. In the first example, it is also true that L has no critical points 
at all; in the second example, however, multiple solutions exist to the equations that 
define the critical points of L. 

511 is evident that f is negative when x < O because, in this case, 2x3 < 0 and —3x? < 0. Since 


f'(x) = 6x? — 6x, a simple calculation shows that f is strictly decreasing on the interval (0, 1) and is 
strictly increasing for x > 1. Since f(0) = (3/2) = 0, the claim in the text obtains. 
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Example 5.9 Consider the problem of maximizing and minimizing f(x, y)~=.- 
x? — y? subject to the single constraint g(x, y) = l — x — y = 0. Observe 
that Dg(x, y) = (—1,—1) at any (x, y). so p(Dg(x. y)) = | everywhere. and 
the constraint qualification holds everywhere on the feasible set. Let L(x, 3, A) 
f(x, y) + Agtx, y). The critical points of L are the solutions (x, y, A) to: 


2x-rA =0 
—2y-aA=0 
i-x-—y=0. 
If à Æ 0, then the first two equations imply that x = — y, but this violates the third 


equation. If à = 0, then from the first two equations, we have x = y = 0, SO again 
the third equation cannot be met. Thus, there are no solutions to the critical points 
of L. 

Since the constraint qualification holds everywhere, it must be the case that global 
maxima and minima fail to exist in this problem (otherwise, by Proposition 5.6, such 
points must arise as critical points of Z), and, indeed, it is easy to see that this is 
the case. For any x € R, the point (x, | — x) is in the feasible set of the problem. 
Moreover, f(x, 1 — x) = x? — (1 — x), so by taking x large and positive, f can 
be made arbitrarily large, while by taking —x large and positive, f can be made 
arbitrarily negative. Oo 


Example 5.10 Let f and g be functions on R? defined respectively by 


l 3 
Jœ. y) = AS ERN 


and g(x, y) = x — y. Consider the problem of maximizing and minimizing f on 
the set D = {(x, y) | g(x, y) = 0}. Since Dg(x, y) = (1, —1) at all (x, y). we 
have p(Dg(x, y)) = I atall (x, y) and the constraint qualification holds everywhere 
on the feasible set. If we set up the Lagrangean L(x, y) = f(x, y) + Ag(x. y). the 
critical points of L are the solutions (x, y, A) to 


x742424=0 
—3y-rA =0 
x-y=0. 


This system of equations has two solutions: (x, y,A) = (2,2, —6) and (x, y, à) = 
(1, 1, -3). 

Evaluating f at these two points, we get f(1,1) = 5/6 and f(2, 2) = 2/3, so if 
we were following the cookbook Lagrangean procedure, we would pick (1, 1) as the 
point where f is maximized on D, and (2, 2) as the point where f is minimized on 
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D. Indeed, the second-order conditions confirm these points, in the sense that (1, 1) 
passes the test for a strict local maximum, while (2, 2) passes the test for a strict local 
minimum.® l 

However, neither of these points is a global optimum of the problem: In fact, the 
problem has no solutions, since f(x, 0) —> +00 as x — +00, and f(x, 0) —> —oo 
asx > —o0. g 


The examples of this subsection add a word of caution regarding the blind use of the 
Lagrangean method in solving equality-constrained problems. In many applications, 
as we pointed out earlier, it is possible to verify a priori both the existence of a 
solution, and the constraint qualification condition, and in such cases the method 
works well. Sometimes, however, such a priori knowledge may not be available, 
and this could be problematic, whether or not solutions to the critical points of the 
Lagrangean exist. 

To elaborate, the critical points of the Lagrangean could fail to have a solution for 
two very different reasons. First, as in Example 5.7, this could occur because although 
an optimum exists, the constraint qualification is violated at that point; alternatively, 
as in Example 5.9, this could be the case because an optimum does not even exist. 
Therefore, the absence of a solution to the critical points of the Lagrangean does not 
enable us to draw any conclusions about the existence, or nonexistence, of global 
optima.’ 

On the other hand, it is also possible that while solutions do exist to the critical 
points of L, none of these is the desired optimum point. Again, this could be because, 
although an optimum exists, the constraint qualification is violated at that point (as 
in Example 5.8); or because an optimum does not exist (as in Example 5.10).8 Thus, 
even the existence of solutions to the critical points of L does not, in general, enable 
us to make inferences about the existence, or nonexistence, of optima. 


®To see this, note that 


2 0 
DL (x,y) = [3 a 


Note also that, since Dg(x, y) = (1, —1) at all (x, y), the set 
Z(x,y) = {z€ R? | Dga, y)z = 0) 


is simply the set {z € R? | z = (w, w), w € R}, which is independent of (x, y). Denote this set by just Z. 
To show that a point (x, y) is a strict focal maximum (resp. strict local minimum) of f on D, it suffices by 
Theorem 5.4 to show that z’ D? L(x, y)z is strictly negative (resp. strictly positive) for z € Z with z # 0. 
At (x, y) = (1, 1) and z = (w, w) € Z, we have z' D*L(x, y)z = —w? < 0 for all w % 0, while for 
(x, y) = (2, 2), we have z' D? L(x, ¥)z = w? > 0 for all w # 0. The claimed results are proved. 
Anexception to this statement arises when the constraint qualification is known to hold everywhere on the 
constraint set D. In this case, if a global optimum exists, it must turn up as part of a critical point of L; 
thus, if no critical points of L exist, it must be the case that the problem admits no solution, 

8Note the important point that, as evidenced by Example 5.10, the second-order conditions are of limited 
help in resolving this problem. Even if a critical point of L could be shown to be a local maximum (say), 
this would not establish it to be a global maximum, 


5.4 Using the Theorem of Lagrange 127 
5.4.4 A Numerical Example 


We close this section with an illustration of the use of the Lagrangean method on 
a simple numerical example. Consider the problem of maximizing and minimizing 
f(x, y) = x? — y? subject to the single constraint g(x, y) = 1 — x? — y? = 0. The 
constraint set 


D = {a y ER? |x? +y = l) 


is simply the unit circle in R?. 

We will first show that the two conditions of Proposition 5.6 (namely, existence of 
global optima, and the constraint qualification) are both met, so the critical points of 
the Lagrangean L will, in fact, contain the set of global maxima and global minima. 
That solutions exist is easy to see: f is a continuous function on D, and D is evidently 
compact, so an appeal to the Weierstrass Theorem yields the desired conclusion. To 
check the constraint qualification, note that the derivative of the constraint function 
gat any (x,y) E R? is given by Dge(x, y) = (2x, 2y). Since x and y cannot 
be zero simultaneously on D (otherwise, x? + y? = 1 would fail), we must have 
p(Dg(x, y)) = 1 at all (x, y) € D. Therefore, the constraint qualification holds 
everywhere on D. 

` New set up the Lagrangean L(x, y, A) = x? — y? + AC — x? — p?). The critical 
points uf L are the solutions (x, y, A) € R? to 


2x —2Ax = 0 
—2y—2rAy=0 
x? 4 y? =l. 


From the first equation we have 2x(1 — A) = 0, while from the second, we have 
2y(1 + A) = 0. If A 4 +1, these can hold only if x = y = O, but this violates 
the third equation. SoA = +1. It easily follows that there are only four possible 
solutions: 

(+1,0, +1) 

(-1,0, +1) 

(0,+1,—1) ` 

(0, —l, mad 1) 


(x, y, A) = 


Evaluating f at these four points, we see that f(1,0, 1) = f(—1,0, 1) = 1, while 
f(0,1, -1) = f(0, ~1, —1) = 0. Since the critical points of L contain the global 
maxima and minima of f on D, the first two points must be global maximizers of f 
on D, while the latter two are global minimizers of f on D. 
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5.5 Two Examples from Economics 


We now present two familiar examples drawn from consumer theory and producer 
theory, respectively, namely, those of utility maximization and cost minimization. 
The presentation here achieves two purposes: 


1. As with most problems in economic theory, these problems are characterized 
by inequality, rather than equality constraints. We show that, under suitable 
conditions, it may be possible to reduce such problems to equivalent equality- 
constrained problems, and thereby to bring them within the framework of the 
Theorem of Lagrange. 

2. We demonstrate the use of the Lagrangean L in determining the solution to 
the transformed equality-constrained maximization problems. In particular, we 
show how to 


(a) use the critical points of Z to identify candidate optimal points in the reduced 
equality-constrained problem; 

(b) use the second-order conditions to determine if a critical point of L is a local 
optimum; and, finally, 

(c) combine the Weierstrass Theorem and the Theorem of Lagrange to identify 
global optima of the reduced, and thereby the original, problem. 


Of course, the phrase “under suitable conditions” is important for the transformation 
of inequality-constrained problems into equality-constrained ones. See the remarks 
at the end of this section. 


5.5.1 An Illustration from Consumer Theory 


The utility-maximization example we consider here involves a consumer who con- 
sumes two goods, and whose utility from consuming an amount x; of commodity 
i = 1, 2, is given by u(x1, x2) = x) x2. The consumer has an income / > 0, and the 
price of commodity i is p; > 0. The problem is to solve: 


max{xjx2 | Z — pix; — p2x2 2 0, xı = 0, x2 > 0}. 
We proceed with the analysis in three steps: 


Step 1: Reduction to an Equality-Constrained Problem 


As stated, this utility-maximization problem is not an equality-constrained one. We 
begin by transforming it ina manner that will facilitate use of the Lagrangean method. 
To this end, note that the budget set 


B(p, D = ((x1, x2) | I — pixi — p2x2 = 0, x) 2 0,x2 > 0} 


is a compact set. The utility function u(x}, x2) = x)x2 is evidently continuous on 
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this set, so by the Weierstrass Theorem a solution (xf, x3) does exist to the given - 
maximization problem. 

Now, if either x} = 0 or xz = O, then u(x), x2) = 0. On the other hand, the 
consumption point (X1, X2) = (7/2 pi, 1/2 p2), which divides the available income 
I > O equally between the two commodities, is a feasible consumption point, 
which satisfies u(X1, X2) = x;X2 > 0. Since any solution (x7, x3) must satisfy 
u(xf, x7 > u(X), X2), it follows that any solution (xf, x3) must satisfy x* > 0, 
i = 1, 2. Moreover, any solution must also meet the income constraint with equality 
(i.e., we must have pixi + P2x> = /) or total utility could be increased. 

Combining these observations, we see that (xf, x3) is a Solution to the original 
problem if and only if it is a solution to the problem 


max(x)x2 | pixi + prx2 = Í, x1, x2 > 0}. 
The constraint set of this reduced problem, denoted, say, B*(p, /), can equivalently 
be written as 
BY(p, D = Rhy G1, x2) | = pixi — pox = 0), 
and by setting U = R? , and g(x), x2) = J — pix; — p2x2, we are now within the 
setting of the Theorem of Lagrange. 
Step 2: Obtaining the Critical Points of L 
We set up the Lagrangean 
L(x1, x2, A) = xyx2 + AU = pixi — p2x2). 
The critical points of L are the solutions (xf, x3, à) € Re, x Rto: 
xX2-Ap; = 0 
x}—Ap2 =0 
I — pixi ~ prx2 = 0. 
If 4 = O, this system of equations has no solutions, since we must then have x; = 
x} = 0 from the first and second equations, but this violates the third equation. So 
suppose A # 0. From the first two equations, we then have à = x4 /p2 = x2/p1. 80 
xı = P2x2/ py. Using this in the third equation, we see that the unique solution to 
this set of equations is given by xf = //2 pı, x3 = 1/2 p2, and à* = 1/2) pp. 
Step 3: Classifying the Critical Points of L 


As a first step in classifying the single critical point of L, we show how the second- 
order conditions may be used to check that (x7, x3) is a strict local maximum of u 
on B*(p, I). We begin by noting that Dg(x}, x3) = (~ pi. — p2) so the set 


Z2(x*) = {z€ R? | Dg(x*)z = 0} 
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is simply 
Z(x") = fzer eis — 7}, 
PI 
Defining D? L* = D?u(x*) + A* D? g(x"), we have 


0 1 0 0 01 
27% * = 
pk =, Abe h | f J 


So for any z € RÊ, we have 7’ D? L*z = 22122. In particular, for any z € Z(x*) with 
z #0, we have z'D*L*z = —2p2z3/pı < 0. Thus, by Theorem 5.4, (xf, x3) is a 
strict local maximum of u on B*(p, I). 

However, we can actually show the stronger result that (x7, x3) is a global max- 
imum of u on B*(p, I) by showing that the conditions of Proposition 5.6 are met. 
First, note that a solution (i.e., a global maximum) does exist in the problem 


max{x}, x2 | (x1, x2) € B*(p, D}, 
by the arguments given in Step 1. Second, note that the single constraint 
g(x, x2) = IT pix, — p2x2 


of this problem satisfies Dg(x1, x2) = (— pı, — p2) everywhere on B*(p, I). Since 
pı, P2 > Oby hypothesis, we have p(Dg(x1, x2)) = 1 at all (x1, x2) € B*(p, 7). In 
particular, the constraint qualification holds at the global maximum. Therefore, by 
Proposition 5.6, the global maximum must be part of a critical point of the Lagrangean 
L inthis problem. There is a unique critical point, with associated x-values (x7, x3), 
so these x-values must represent the problem’s global maximum. 


5.5.2 An Illustration from Producer Theory 
We now turn to the cost-minimization problem faced by a firm, which uses two inputs 
xı and x2 to produce a single output y through the production function y = g(x1, x2). 
The unit prices of x; and xz are w; > O and wz > 0 respectively. The firm wishes to 
find the cheapest input combination for producing y units of output, where y > 0 is 
given. Let X(j) denote the set of feasible input combinations: 
XO) = (1, x22) € R? | x1x2 = J, x1 2 0,22 > 0}. 
The firm’s optimization problem is: 


Minimize wx) + w2x2 subject to (x1, x2) € X(y). 


We will assume throughout that w; > Oand wz > 0 (otherwise, it is apparent that 
a solution cannot exist). Once again, we proceed in three steps. 
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Step I: Reduction to an equality-constrained problem 
In this problem, this first step is trivial. If xy = 0 or x2 = 0, we have x1x2 = 0, 50 
the inequality constraints have no bite, and the constraint set is actually given by 
X*(j) = R24 N {(x1, x2) € R? | x122 = P). 
Setting f(x), x2) = wpxy + wx2, B(r1, x2) = J — X1xX2, and U = RẸ}, we are in 
the framework of the Theorem of Lagrange. 
Step 2: Obtaining the Critical Points of L 

Define the Lagrangean L(x, x2,4) = f(x), x2) + A(X, x2) = wxi + wx + 
My — X1X2). The critical points of L are the solutions to: 

wi —Ax2 = 0 

wz — àx; = 0 

ý— xx = 0. 
From the first (or the second) equation, 4 cannot be zero. Therefore, from the first 


two equations we have 
wi x2 wx] 
or x2 = —— 


w èë xX w ` 
Substituting this into the third equation, we obtain the unique solution 


1/2 1/2 1/2 

w w wiw 

x= (=) X% = (=) , and à* = (=> 2) 
wi w2? y 


Step 3: Classifying the Critical Points of L 
Once again, we begin with a demonstration that (x, x3) is a strict local minimum of 
Sf (x1, 42) = w1 x1 + w2x2 on X* (9). This time, we employ Theorem 5.5 to achieve 
this end. First, note that 
Dg(xt,x3) = (—xz, —xf). 


Next, note that the matrix D? L* = D? f(x*) +A* - D?g(x*) is given by 


2,«.[9 O],,.{ 0 ~a] [o ~x 


By Theorem 5.4, if we show that z’D?L*z > 0 forall z # Osuchthat D? g(x*)-z = 0, 
then we would have established that x* is a strict local minimum of f on D. By 
Theorem 5.5, showing that z’ D? L*z > 0 forall z # O such that Dg(x*)z = 0 is the 
same as showing that the determinant of the following matrix is negative: 


C= 0 Dg(x") 
pe | Dey DAL |) 


132 Chapter 5 Equatity-Constrained Optimization 


Substituting for Dg(x*) and D? L*, we obtain: 


0 -x3 «xT 
Cy = | =x% 0 —A* 
=x] —A* 0 
Some simple calculation shows that |C,| = —2A*x{x} < 0, as required. Thus, 


(xf, x3 ) is a strict local minimum of f on D. 

As in the utility-maximization problem, however, we can show the stronger result 
that (x7, x7) is a global minimum of f on X*(y), by proving that the problem meets 
the conditions of Proposition 5.6. The existence issue is a little trickier, because the 
feasible action set X*() is noncompact. However, as explained in Chapter 3, we 
can reduce the problem to one with a compact action set as follows. Let (x1, 22) = 
((j)'/2, (p)"/2). Note that 2122 = D, so (41, £2) € X*(). Define ĉ = wifi + who. 
Let ï; = 2¢/w, and x2 = 2ĉ/w2. Then, it is clear that the firm will never use 
more than x; units of input /, since the total cost from doing so will strictly exceed 
ĉ (no matter what the quantity of the other input used), while the input combination 
(£1, 2} can produce the desired output at a cost of ¢. Thus, we may, without loss, 
restrict the feasible action set to 


X(¥) = (Œ, x2) € R? | xyx2 = }, x; € (0, ž;), è = 1, 2), 


so that f(x}, x2) = wx + w2x2 achieves a global minimum on X*(y) if and only 
if it achieves a global minimum on X(}). Since X(j) is evidently compact, and f 
is continuous on this set, a minimum certainly exists on X(j) by the Weierstrass 
Theorem; therefore, a minimum of f on X*(j) also exists. 

That the constraint qualification condition is met is easier to see. The problem’s 
single constraint g(x), x2) = y—x1X2 satisfies Dg(x}, x2) = (—x2, —x1), and since 
neither x; nor x2 can be zero on X*(y) (otherwise, x,x2 = y > Qis violated), we have 
p(Dg(x1, x2)) = 1 everywhere on X*(j). In particular, the constraint qualification 
must hold at the global minimum. 

Since both conditions of Proposition 5.6 are met, the global minimum must turn 
up as part of the solution to the critical points of L. Since only a single solution exists 
to these critical points, it follows that the point (x7, x3) is, in fact, a global minimum 
of f on D. 


5.5.3 Remarks 


At the beginning of this section, it was mentioned that the reduction of inequality- 
constrained problems to equality-constrained ones was possible only under suitable 
conditions. The purpose of this subsection is to emphasize this point through three 
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examples, which demonstrate e danger of attempting such a reduction by ignor= - 
ing nonnegativity constraints, when ignoring these constraints may not be a legit- 
imate option. In the first two examples, the problems that arise take on the form 
of inconsistent equations defining the critical points of the Lagrangean. In the last 
example, the Lagrangean admits a unique solution, but this is not the solution to 
the problem. 


Example 5.11 Consider a utility maximization problem as in subsection 5.5.1, but 
where the utility function u(x1, x2) = x1x2 is replaced with v(x1, x2) = ¥1 + %2. 
That is, we are to solve 


Maximize xı + x2 subject to (x1, x2) € Bp, /). 


It is still true that the utility function is continuous and the budget set is compact, 
so the Weierstrass Theorem guarantees us a maximum. It also remains true that any 
maximum (xf, x3) must satisfy J — pyxf — p2x3 = 0. Nonetheless, it is not possible 
to set up the problem as an equality-constrained maximization problem, since the 
constraints x} > 0 and x2 > O can no longer be replaced with x; > O and x2 > 0. 
Indeed, it is obvious upon inspection that the solution to the problem is 


a (1/p1,0), if py < m 
(xj. x3) = f 
(0,//p2), ifp >m 


and that any (x1, x2) € R? that satisfies pix; + p2x2 = l is optimal when pı = p2. 
Thus, the constraints x; > 0 “bite” when p; # p2. 

If we had ignored this problem, and attempted to use the Lagrangean method to 
solve this problem, the following system of equations would have resulted: 


l—ìp =0 
l—ìp = 0 
Pix, + p2rx2 = I. 


The first two equations are in contradiction except when p; = p2, whichis the only 
case when the nonnegativity constraints do not bite. Thus, except in this case, the 
Lagrangean method fails to identify a solution. o 


A similar problem can also arise in the cost-minimization exercise, as the next 
example demonstrates. 


Example 5.12 Suppose that in the problem of subsection 5.5.2, the production func- 
tion were modified to the linear function g(x1, x2) = x; +x2. A little reflection shows 
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that, in this case, the solution to the problem is given by 


(9,0) = if wy < w 
(x1, x2) = 7 f 
(0, y) if WwW, > w2 
and that any pair (x1, x2) € X(y) is optimal when w; = w2. Thus, when w; 4 
w 2, the nonnegativity constraints matter in a nontrivial sense. If these constraints 
are ignored and the problem is represented as an equality-constrained problem, the 
Lagrangean has the form 


L(x1,X2,A) = wy xy + w2x2 + A(x + x2 — Y), 


and the critical points of L are the solutions to 


wp +rA=0 
w2+A=0 
xi +x = y. 


The first two equations are consistent if and only if w, = w2°Thus, if w; 4 wo, 
the Lagrangean method will not work. a 


Finally, the following example shows that even if the equations that define the 
critical points of L do have a solution, the solution need not be the solution to 
the problem if the reduction of the original inequality-constrained problem to an 
equality-constrained one is not legitimate. 


Example 5.13 We consider the problem of identifying Pareto-optimal divisions of 
a given amount of resources between two agents, as described in subsection 2.3.7. 
Suppose there are two commodities, and the endowment vector w € RZ + consists of 
x units of the first resource and y units of the second resource. Let x; and y; denote 
respectively the amounts of resources x and y that are allocated to agent i, i = 1, 2. 
Agent i’s utility from receiving the allocation (x;, yj) is assumed to be given by 
u(x;, yi) = xi yi. Given a weight œ € (0, 1), the problem is to solve 


Maximize [æx; y; + (1 — æ)x2 y2] subject to (x1, x2, y1, y2) E€ F(x, y), 
where 
F(x, y) = (01, 1,42, 2) ERE |x tx Sx, pity <y) 


An optimum evidently exists in this problem, since the feasible set is compact 
and the objective function is continuous. Moreover, it is also apparent that at the 
optimum, both resources must be fully allocated; that is, we must have x; + x2 = x 
and y; + y2 = y. If we ignore the nonnegativity constraints in this problem, and 
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write it as an equality-constrained problem with only the resource constraints, the - 
Lagrangean for the problem is 


L(x1, X2, Yi, Y2, Àx, Ay) = oxi yı + (l — @)x2y2 
Age — xi — x2) + ày yi = y). 


The critical points of L are the solutions to: 


ayı ~à =O 
Q -a@)y2 ~Ay =0 
axı ~ ày =O 


Q =a) ~à =0 
X—-XxXy~-—x2. =O 
Bae 2 ina seme 
A simple computation shows that these equations admit the unique solution 
xf = (l—a@)x yy = (l-a@)y 
xp = ax yy = ay 
AF = all -—a@)y Ay = al a)x. 


However, it is easy to show that these values do not identify a solution to the maxi- 
mization problem. When they are substituted into the objective function, we obtain 


tt 


axi yi + (l—a@)xzy} = a(l ~a@)x(1 —a@)y + (1 ~ ajaray 


= a(l ~a@)xy. 


On the other hand, if the entire available vector of resources is given over to a single 
agent (say, agent 1), then the objective function has the value ax y. Since æ € (0, 1), 
this is strictly larger than a(1 — @)xy. 

The problem here, as in the earlier examples, is that the nonnegativity constraints 
bite at the optimum: viz., it can be shown that the solution to the problem is to turn 
over all the resources to agent | if @ > 1/2, to agent 2 if œ < 1/2, and to give it all 
to any one of the agents if @ = 1/2. a) 


5.6 A Proof of the Theorem of Lagrange 
The following notational simplification will aid greatly in the proof of Theorem 5.1: 
1. We shall assume, without loss of generality, that the k x k submatrix of Dg(x*) 


that has full rank is the k x k submatrix consisting of the first k rows and k 
columns. 
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2. We will denote the first k coordinates of a vector x € D by w, and the last (n ~ k) 
coordinates by z, i.e., we write x = (w, z). In particular, we shall write (w*, z*) 
to denote the local optimum x*. 

3. We will denote by Dfy (w, z), the derivative of f at (w, z) with respect to the w 
variables alone, and by Df, (w, z), the derivative of f at (w, z) with respect to the 
z variables alone. Dgy(w, z) and Dg,(w, z) are defined similarly. Note that the 
dimensions of the matrices Df, (w, z), Df,(w, z), Dgw(w, z), and Dg, (w, z), 
are, respectively, 1 x k, 1 x (n — k), k x k, and k x (n — k). 

4. We shall treat the vector A* in R*, whose existence we are to show, asa 1 x k 
matrix.? Thus, for instance, we will write A* Dg(x*) to represent baa AF Dg (x*). 

In this notation, we are given the data that (w*, z*) is a local maximum!® of fon 
D = UN {(w, z) e R” | g(w, z) = 0}; and that p(Dgy(w*, z*)) = k. We are to 
prove that there exists 4* such that 

Dfy(w*,2z*) +A* Dew(w*, z*) = 0 
Df, (w*, z*) + 4* De,(w*, z*) = 0. 

AS a first step, note that since p(Dgy(w*, z*)) = k, the Implicit Function Theorem 
(see Theorem 1.77 in Chapter 1) shows that there exists an open set V in R” 
containing z*, anda C! function h: V > R such that h(z*) = w* and g(h(z),z)= 
0 for all z € V. Differentiating the identity g(h(z), z) = O with respect to z by using 
the chain rule, we obtain 


Dgw(h(z), z)Dh(z) + Dg,(h(z), z) = 0. 
At z = 2*, we have A(z*) = w*. Since Dgy(w%, z*) is invertible by assumption, 
this implies 
Dh(z*) = —[Dgy(w*, z*)]7' Dg,(w*, 2”). 
Now define A* by 
a* = —Dfy(w*, z*)[Dgy(w*, z*)]7!. 


We will show that A* so defined meets the required conditions. Indeed, it follows from 
the very definition of A* that when both sides are postmultiplied by Dgw(w*, 2*), 
we obtain 


A* Dgw(w*, 2*) = —Dfy(w*, z*)[ Dew (w*, z)! Dgw(w™, 2*) 
= —Dfy(w", z*), 


As we mentioned in Chapter 1, it is customary to treat vectors as column vectors, and to use the transpose 
to represent them as row vectors. We are not following this rule here. 

10We assume in this proof that x* is a local maximum; the proof for a local minimum follows identical steps, 
with obvious changes, 
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which is the same as 

Dfy(w*, z*) + A* Dey (w*,z*) = 0. 
Thus, it remains only to be shown that 

Df, (w*, z*) + d* Dg,(w*,z*) = 0. 


To this end, define the function F: V —> R by F(z) = f(h(z), z). Since f has a 
local maximum at (w*, z*) = (A(z*), z*), it is immediate that F has a local max- 
izvan at z*. Since V is open, z* is an unconstrained local maximum of F, and the 
first-U.der conditions for an unconstrained maximum now imply that DF(z*) = 0, 
or that 


Dfu(w*, 2") Dh(z*) + Df,(w*,z*) = 0. 
Substituting for Dh(z*), we obtain 
—Dfw(w*, z")[Dgw(w*,2")]"! Dgs(w*, 2") + Dfz(w*, z*) = 0, 
which, from the definition of À, is the same as 
Df,(w", z*) + A* Dg,(w*,z*) = 0. 


“Theorem 5.1 is proved. o 


5.7 A Proof of the Second-Order Conditions 


We prove parts 1 and 3 of Theorem 5.4 here. Parts 2 and 4 are proved analogously. 
The details are left as an exercise to the reader. 


Proof of Part 1 of Theorem 5.4 


We will prove Part | of Theorem 5.4 for the case where there is just a single constraint, 
i.e., where g: R” — R. The general case follows exactly along the same lines, except 
that notation gets considerably more involved.'! The proof we will present is a direct 
extension of the proof of the first-order necessary conditions provided in Section 5.6. 
In particular, we will use the notation introduced there, adapted to the case k = 1. 
That is: 


e We will denote a typical point x € R” by (w,z), where w € R and z € R"™!. 
The point x* will be denoted (w*, z*). 


ll Specifically, the added complication derives from the fact that when there is only a single constraint g, the 
second-derivative D?g is an n x n matrix, while in the general case where g= (gi gk). Dg is itself 
ann x k matrix, so D?g = D( Dg) is an array of dimensions n x k x n. A typical entry in this array is 
3? gi(x)/Axjax. where i = },... k and j,/=1,.... 7. 
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e We will denote by Df, (w, z) and Dfz(w, z) the derivatives of f with respect to 
the w-variable and the z-variables. The terms Dg, and Dg, are defined similarly. 
Of course, we have Df = (Dfw, Dfz) and Dg = (Dgw, Dez). 

e We will assume, without loss of generality, that the 1 x 1 submatrix of Dg(w*, z*) 
that has full rank is Dg,,(w*, z*). By the Implicit Function Theorem, there is a 
neighborhood V of z* e R"~!,andaC! functionh: V —> R such thath(z*) = w*, 
and g(A(z),z) =Oforallze V. 


In an extension of this notation to second-derivatives, we define D? Sww to be the 
derivative of D fọ with respect to the w-variable; D? fyz to be the derivative of Df, 
with respect to the z-variables; and D? f:z to be the derivative of Df; with respect 
to the z-variables. The terms D? gy, D? gwz, and D? gzz are defined analogously.'? 
Finally, for notational ease, we will simply use the superscript “*” to denote eval- 
uation of the derivatives of f and g at x* = (w*,z*). So, for instance, D/* 
will denote Df(w*, z*), D? f*,, will denote D? fuw(w*, z*), D?g*,, will denote 
D? gst, =*), ete. 

In this notation, we are given that f and g are C? functions mapping R” into R; 
that (w*, z*) is a local maximum of f on the set 


D = UN {(w, z) € R” | g(w, z) = 0}, 


and that p(Dgw(w*, z*)) = 1, where U C R” is open. We have shown in Section 5.6 
under these conditions, that if 


Mos -DAD ', 


then 


Dft + "Det 
Df (x*) +A" Dg") = | Sa ENRE l 


Dff + 4* De® 


We are now ready to show that, if D?L* denotes the n x n quadratic form 
D? fw", z*) + A* D2 g(w*, z*), we have 


x'D?’ L*x <O forall x € Z* = {x ER" | Dg(w*,z*)x = 0). 
As a first step, note that since g(/(z), z) = 0 for all z € V, we have 


Dgw(w, z)Dh(z) + Dg:(w,z) = 0 


!2Note that, in full notation, we have 


5 De fow D? foz 2 D? guw D’ guz 
Df = 3 : , , and D’g= ; 7 
(D? fur)’ D? fer (D?gwz) Dg 
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for all (w, z). Differentiating again with respect to z, and suppressing notation on . - 
(w, z), we obtain 


(Dh)'[D? gww Dh + D?’ guyz] + Dgw D?h + (Dh) D?’ gw: + D? gz: = 0. 


Evaluating these expressions at (w*, 2*), and using the assumption that Dg}, + O, 
we obtain 


Dh* 


—[Dgi,}"! Dg’, 
D?h* = [Dg]! ((Da*y'D? gu Dh" + 2(Dh"Y D* gy. + D°gt,). 
Now, define F:V —> R by F(z) = f(h(z), z). Since f has a local maximum at 
(w*, z*), it is easy to see that F has a local maximum on V at z*. Therefore, we 
must have z’ D? F(z*)z < 0 forall z € R”. Since DF(z) = Dfw(h(z), z) Dh(z) + 
Df, (h(z), z) at all z € V, we have 


D? F(z") = (Dh*Y D? ft, Dh* + 2(Dh*)' D? fè, + DZ DPR D? ft. 
Substituting for D?h*, using the fact that A* = —Df*{ Dg*,|~ |, and writing D? LY, 
for D? fă y + A*D*g*,,,, etc., we obtain 

DF(z*) = (Dh")'D*L*,,, Dh* +2(Dh*Y D? L*,, + D? LZ., 


where Dh* = —[Dg*]! Dg. Since D? F(z") isa negative semidefinite quadratic 
form, we have shown that the expression on the right-hand side of this equation is 
also a negative semidefinite quadratic form on R”, i.e., we must have 


(Dh*E) D? L? y Dh*§ + 2(Dh*E) D? LY, +&'D? LZE < 0 


for all € € R”. In more compact matrix notation, we can write this as 


D?L* «Oth TDR 
[ Dh*é,&] <0 
(DL) DL, § 


for all £ € R”. To complete the proof, we will now show that this is precisely 
the same thing as the negative definiteness of D?L** on the set Z*. Indeed, this 
is immediate: note that since Dg(w*, z*) = (Dgy(w*, z*), Dg-(w*, 2*)) it is the 
case that (w, £) € Z* if and only if Dg*,w + Dg*& = 0, i.e., if and only if 


w = -[Dgt | 'Delé = Dh'é. 


Therefore, D?L* is negative semidefinite on Z* if and only if for all € R”, we 


have 
Bert. D Lg: | [ DEE 
[Dh*é, E] 7 > < 0. 
(DL) > DLS E 


The result is proved. 
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Proof of Part 3 of Theorem 5.4 


The sufficient conditions given in Part 3 of Theorem 5.4 may be established by 
reversing the arguments used in proving the necessary conditions of Part 1. That is, 
we begin with the condition that D? L* is negative definite on the set Z*, show that 
this implies that the second derivative DF (z*) of the function F(z) = (A(z), 2) 
is negative definite on R”, and use this to conclude that f has a strict local maximum 
at (w*, z*). This is a relatively straightforward procedure, so we do not fill in the 
details here. Instead, we offer an alternative method of establishing Part 3, which 
uses a completely different approach. 

We revert to the notation of Theorem 5.4, and drop the simplifying assumption 
that k = 1. We are given that x* satisfies o(Dg(x*)) = k, and that there exists 
à* € IR‘ such that the following two conditions are met: 


k 
Df (x*) + XCA} Dgi(x*) = 0, 


t=] 
x'D? L*x <0 forallO#x € Z* = {v € R” | De(x*)v = 0}. 


We are to show that x* is a strict local maximum of f on D, i.e., that there exists 
r > Osuch that f(x*) > f(x) for all x € B(x*,r)N D. 

We use a reductio ad absurdum approach. Suppose that under the stated conditions, 
the conclusion of the theorem were not true. Then, it is the case that for all r > 0, 
there exists x(r) € B(x*,r) A D such that x(r) 4 x* and f(x(r)) > f(x*). Pick 
any sequence r; > 0, r; | O, and let x; = x(r1). We shall use the sequence {x;} to 
show that there must exist at least one point y € R” with the following properties: 


L yO. 

2. Dg(x*)y = 0. 

3. y'D*L*y > 0. 
This will furnish the required contradiction, since, by hypothesis, we must have 
y'D?L* y < 0 for all y Æ 0 such that Dg(x*)y = 0. Taylor's Theorem, which was 


described in Chapter 1 (see Theorem 1.75), will play a central role in this process. 
Define a sequence {y;} in R” by 


y xy) — x* 
Lo 
xr — x* | 
Since x; # x* for any /, the sequence {);} is well-defined. Moreover, we have 
ilyrl| = 1 for all Z, so the sequence {y;} lies in the unit circle C"-! in R": 
cn) = {y eR" | yt =U. 


As a closed and bounded subset of R”, C”~! is compact, so the sequence {y} admits 
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a convergent subsequence {ymy} Converging to a point y € C” ~l. Observe that since _ 
we must have || yl] = 1, it is certainly the case that y 4 0. 

We will now show that Dg(x*) y = 0. Using a first-order Taylor series expansion 
of g around x*, we have 
Em) = 8A) + Dga ama = x7) + Rimi. x"). 

By definition of x*, and by choice of Xmu), we must have g(x*) = g(X%mu)) = 0. 
Therefore, 

Dg(x*)(xmay =x) ROtmiy. X*) R(Xmiy. X*) 

0 = ZEN lL Pea ml = Dg(x*)ymu) + te 
Xm —x*|l Xm) =x] Ema = xl 
By Taylor’s Theorem, we must have R(xm 1. X*)/\lXmay — x* IL —> O as xm = t. 
Therefore, taking limits as / —> oo, and noting that y,(7) —> Zz, we obtain 


0 = Deg(x*)y. 


Finally, to complete the proof, we will show that y’ D? L* y > 0. For notational 
ease, let L(-) denote L(-; A*). Since L(-) is C2, asecond-order Taylor series expansion 
around x* yields 


L(xmiy) = LA") + DLA Xma) — x*) 
1 * * 
+5 mu — xY DP L(x" muy ~ x7) + Rmi, x*). 
Now, we also have 
1, DL(x*) = Df(x*) + DEL, A Dgi(x*) = 0. 
2. D?L(x*) = D?L*. 
3. Lmh) = Om) + ELA gi Om) = fmn), Since g(%m)) = 0 by 
choice of xp(/). 
4. L(x*) = f(x") + Die Mex") = f(2"), since g(x*) = 0 by definition of 
x 
Substituting these into the Taylor expansion of L(-), we obtain 
Sm) = S08") + muy — xY D? L (muy ~x") + RO my x"). 
Rearranging terms, dividing through by Xmu) — x*|}?, and using the fact that 
hi = Xm(l) ~ x" 
PO md = 


we see that 


Som- f _ 


, tye R2(Xmity, x") 
= DL + — 
inp cae Ree 


Xm) = x?) 
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By the definition of the sequence {x;}, the left-hand side of this equation must be 
nonnegative at each /. Since R2(Xm()s X*)/l%m(y —** i? > Oas xm () > x", taking 
limits as 1 — oo now results in 


0 < y'D’L*y, 


as required. Oo 


5.8 Exercises 

1. Find the maximum and minimum of f(x, y) = es y? on the unit circle 
x? + y? = 1 using the Lagrange multipliers method. Using the substitution 
y? = 1—x?, solve the same problemas a single variable unconstrained problem. 
Do you get the same results? Why or why not? 

2, Show that the problem of maximizing f(x, y) = x? + y? on the constraint set 
D = {(x, y) | x + y = 1} has no solution. Show also that if the Lagrangean 
method were used on this problem, the critical points of the Lagrangean have a 
unique solution. Is the point identified by this solution either a local maximum 
or a (local or global) minimum? 


3. Find the maxima and minima of the following functions subject to the specified 
constraints: 
(a) f(x, y) = xy subject to x? + y? = 2a?. 
(b) f(x, y) = l/x + 1/y subject to (1/x)? + (1/y)? = (1/a)?. 
(c) f(x, », Zz) = x + y+ z subject to (1/x) + (1/y) + (1/2) = 1. 
(d) f(x, y, z) = xyz subject to x + y + z = Sand xy + xz + yz = 8. 
(e) f(x, y) =x + yforxy = 16. 
() f(x, y,z) =x? + 2y — z? subject to 2x — y= Oandx+z=6. 
4. Maximize and minimize f(x, y) = x+yonthe lemniscate (x7 -y = x?+y2. 


5. Consider the problem: 
min x? + y? subject to (x — 1)3 - y? =0. 


(a) Solve the problem geometrically. 
(b) Show that the method of Lagrange multipliers does not work in this case. 
Can you explain why? 


6. Consider the following problem where the objective function is quadratic and 
the constraints are linear: 


1 
max dx + 5% Dx subject to Ax = b 


10. 
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where c is a given n-vector, D‘is 4 given n x n symmetric, negative definite ._ 
matrix, and 4 is a given m x n matrix. 


(a) Sct up the Lagrangean and obtain the first-order conditions. 
(b) Solve for the optimal vector x* as a function of A, b, c, and D. 


Solve the problem 
max f(x) =x'Ax  subjectto x-x=1 


where A is a given symmetric matrix. 


. A firm’s inventory of a certain homogeneous commodity, Z (t), is depleted at 


a constant rate per unit time d//dt, and the firm reorders an amount x of the 
commodity, which is delivered immediately, whenever the level of inventory is 
zero, The annual requirement for the commodity is A, and the firm orders the 
commodity n times a year where 


A=nx. 


The firm incurs two types of inventory costs: a holding cost and an ordering cost, 
The average stock of inventory is x/2, and the cost of holding one unit of the 
commodity is Ch, so Chx/2 is the holding cost. The firm orders the commodity, 


_as stated above, n times a year, and the cost of placing one order is Cy, so Cyn 


is the ordering cost. The total cost is then: 
s x ; 
C=C 5 + Con. 


(a) In a diagram show how the inventory level varies over time. Prove that the 
average inventory level is x/2. 

(b) Minimize the cost of inventory, C, by choice of x and n subject to the 
constraint A = nx using the Lagrange multiplier method. Find the optimal 
x as a function of the parameters Co, Ch, and A. Interpret the Lagrange 
multiplier. 


. Suppose the utility function in subsection 5.5.1 is modified to 


u(x, x2) = age, 


where a, 8 > 0. Under what circumstances, if any, can the problem now be 
reduced to an equality-constrained optimization problem? 


Consider the cost-minimization problem of subsection 5.5.2, but with the pro- 
duction function g modified to 


gana) = xp +33. 


(a) Let y = 1. Represent the set {(x), x2) € RZ |x? +2 > y}inadiagram, and 
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argue using this diagram that the solution to the problem is 


(1,0, ifw) < w 


* * — 
eNO Ta ay. Gf dig a 


with either (1, 0) or (0, 1) being optimal if w; = w2. 

(b) Show that if the nonnegativity constraints are ignored and the Lagrangean 
method is employed, the method fails to identify the solution, regardless of 
the values of w; and w2. 


. Consider the problem of maximizing the utility function 


u(x, y) =x! 4 yl l2 


on the budget set {(x, y) € R? | px + y = 1}. Show that if the nonnegativity 
constraints x > O and y > 0 are ignored, and the problem is written as an 
equality-constrained one, the resulting Lagrangean has a unique critical point. 
Does this critical point identify a solution to the problem? Why or why not? 


6 


Inequality Constraints and the Theorem of 
Kuhn and Tucker 


Building on the analysis of the previous chapter, we now turn to a study of opti- 
mization problems defined by inequality constraints. The constraint set will now be 
assumed to have the form 


D= UN{x eR” | hi(x) > 0, i= Veeck fi 


where U C R” is open, and hj:R"” —> R,i = 1,...,/. The centerpiece of this 
chapter is the Theorem of Kuhn and Tucker, which describes necessary conditions 
for local optima in such problems. Following the description of the theorem—and 
of its use in locating optima in inequality-constrained problems—we show that the 
Theorem of Lagrange may be combined with the Theorem of Kuhn and Tucker 
to obtain necessary conditions for local optima in the general case of optimization 
problems defined by mixed constraints, where the constraint set takes the form 


D = UN{x ER" | g(x) =0, f=1,...k Aix) > 0, 7 =1,...,1. 


6.1 The Theorem of Kuhn and Tucker 


The Theorem of Kuhn and Tucker provides an elegant characterization of the be- 
havior of the objective function f and the constraint functions A; at local optima of 
inequality-constrained optimization problems. The conditions it describes may be 
viewed as the first-order necessary conditions for local optima in these problems. 
The statement of the theorem, and a discussion of some of its components, is the 
subject of this section. 


6.1.1 Statement of the Theorem 


In the statement of the theorem, as elsewhere in the sequel, we say that an inequality 
constraint h;(x) > 0 is effective at a point x* if the constraint holds with equality at 
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x*, that is, we have A;(x*) = 0. We will also use the expression | EJ to denote the 
cardinality of a finite set E, i.e., the number of elements in the set £. 


Theorem 6.1 (Theorem of Kuhn and Tucker) Ler f:R" — R and h: R" — R 
be C! functions, i = 1,...,1. Suppose x* is a local maximum of f on 


D = UN{x ER" |hi(x)20,i=1,...,2}, 


where U isan open setin R". Let E C {1, .. . , l} denote the set of effective constraints 
atx*, andlethg = (hi)ie E. Suppose p(Dh g(x*)) = E|. Then, there exists a vector 
AP = (AÑ, AT) E€ R! such that the following conditions are met: 


[KT-1] AF 20 and dfhi(x") =0 fori=1,...,L. 


l 
[KT-2] Df*) + YLA% Dhi(x*) = 0. 


i=l 
Proof See Section 6.5.! o 


Although we have stated the Theorem of Kuhn and Tucker for local maxima, the 
theorem is easily extended to cover local minima. For, if x* were a local minimum 
of f on D, x* would be a local maximum of — f on D. Since D(— f) = — Df, we 
have the following: 


Corollary 6.2 Suppose f and D are defined as in Theorem 6.1, and x* is a local 
minimum of f on D. Let E be the set of effective constraints at x*, leth g = (hidieE, 
and suppose that p(Dhg(x*)) = |El. Then, there exists \* € R' such that 


[KT-1] A*>O0 and Athi(x*) =0 foralli. 


I 
[KT-2'] Df(x*) -9 àf Dhi(x*) = 0. 


i=} 


Proof Follows immediately from Theorem 6.1. o 


Remark Inthe sequel, we will say that a pair (x*, 4*) meets the first-order necessary 
conditions for a maximum (or that it meets the Kuhn—Tucker first-order conditions for 
a maximum) in a given inequality-constrained maximization problem, if (x*, A*) sat- 
isfies A(x*) > 0 as well as conditions [KT-1] and [KT-2] of Theorem 6.1. Similarly, 

'With the (important) exception of the nonnegativity of the vector A, the conclusions of the Theorem of 


Kuhn and Tucker can be derived from the Theorem of Lagrange. Indeed, this constitutes the starting point 
of our proof of the Theorem of Kuhn and Tucker in Section 6.5. 
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we will say that (x*, à*) meets the first-order necessary conditions for a minimum _ 
(or that it meets the Kuhn—Tucker first-order conditions for a minimum) if it satsfies 
h(x*) > Oas well as conditions [KT-1’] and [KT-2’] of Corollary 6.2. 0 


Condition [KT-1] in Theorem 6.1 (which is the same as condition [KT-1"} in 
Corollary 6.2) is called the condition of “complementary slackness.” The terminol- 
ogy arises simply from the observation that by the feasibility of x*, we must have 
hj(x*) > 0 for each i; therefore, for 4*hj(x*) = 0 to hold alongside àf > 0, we 
must have Af = Q if h; (x*) > 0, and h;(x*) = O if A? > 0. That is, if one inequality 
is “slack” (not strict), the other cannot be. 

It must be stressed that the Theorem of Kuhn and Tucker provides conditions that 
are only necessary for local optima, and at that, only for local optima that meet the 
constraint qualification. These conditions are not claimed to be sufficient to identify 
a point as being a local optimum, and indeed, it is easy to construct examples to show 
that the conditions cannot be sufficient. Here is a particularly simple one: 


Example 6.3 Let f:R — R and g:R — R be defined by f(x) = x? and 
g(x) = x, respectively. Consider the problem of maximizing f on the set D = 
{x € R | g(x) > 0}. 

Let x* = A* = 0. Then, x* is evidently in the feasible set, and the problem’s 
single constraint is effective at x*. Moreover, g'(x) = l for all x, so the constraint 
qualification holds at all x, and, in particular, at x*. Finally, note that 


SROAVEE) = 0. 


Thus, if the conditions of the Theorem of Kuhn and Tucker were also sufficient, x* 
would be a local maximum of f on D. However, f is strictly increasing at x* = 0, 
so quite evidently x* cannot be such a local maximum. g 


This example notwithstanding, the Theorem of Kuhn and Tucker tums out to be 
quite useful in practice in identifying optima of inequality-constrained problems. Its 
use in this direction is explored in Sections 6.2 and 6.3 below. 

In the subsections that follow, we elaborate on two aspects of the Theorem of 
Kuhn and Tucker. Subsection 6.1.2 discusses the importance of the rank condition 
that o(Dheg(x*)) = |E|. Subsection 6.1.3 then sketches an interpretation of the 
vector A*, whose existence the theorem asserts. 


6.1.2 The Constraint Qualification 


As with the analogous condition in the Theorem of Lagrange, the condition in the 
Theorem of Kuhn and Tucker that the rank of Dh g(x*) be equal to |E] is called 
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the constraint qualification. This condition plays a central role in the proof of the 
theorem (see Section 6.5 below). Moreover, if the constraint qualification fails, the 
theorem itself could fail. Here is an example to illustrate this point: 


Example 6.4 Let f: R? — R and h:R? +R be given by f(x, y) = —(x? + y?) 
and A(x, y) = (x — 1} — y?, respectively. Consider the problem of maximizing f 
on the set 


D = {(x, y e€ R? | h(x, y) > 0}. 


A solution to this problem can be obtained by inspection: the function f reaches 
a maximum at the point where (x? + y?) reaches a minimum. Since the constraint 
requires (x — 1)? > y?, and y? > 0 for all y, the smallest absolute value of x on D 
isx = 1, which occurs when y = 0; and, of course, the smallest absolute value of 
yon D is y = 0. It follows that f is maximized at (x*, y*) = (1, 0). Note that the 
problem’s single constraint is effective at this point. 

At this global maximum of f on D, we have DA(x*, y*) = G(x* ~ 1)*, 2y*) = 
(0,0), so p(DA(x*, y*)) = 0 < 1, and the constraint qualification fails. On the 
other hand, we have Df(x*, y*) = (—2x*, —2y*) = (—2, 0), so there cannot exist 
à > O such that Df(x*, y*) + ADhA(x*, y*) = (0, 0). Therefore, the conclusions of 
the Theorem of Kuhn and Tucker also fail. o 


6.1.3 The Kuhn-Tucker Multipliers 


The vector A* in the Theorem of Kuhn and Tucker is called the vector of Kuhn- 
Tucker multipliers corresponding to the local maximum x*. As with the Lagrangean 
multipliers, the Kuhn—Tucker multipliers may also be thought of as measuring the 
sensitivity of the objective function at x* to relaxations of the various constraints. 
Indeed, this interpretation is particularly intuitive in the context of inequality con- 
straints. To wit, if hj(x*) > 0; then the i-th constraint is already slack, so relaxing 
it further will not help raise the value of the objective function in the maximization 
exercise, and A? must be zero. On the other hand, if A;(x*) = 0, then relaxing the 
i-th constraint may help increase the valuc of the maximization exercise, so we have 
Ai = 02 

For a more formal demonstration of this interpretation of A, we impose some 
simplifying assumptions in a manner similar to those used in Chapter 5. First, we 
will assume throughout this discussion that the constraint functions h; are all given 


2The reason we have A; > 0 in this case, and not the strict inequality A; > 0, is also intuitive: another 
constraint, say the j-th, may have also been binding at x*, and it may not be possible to raise the objective 
function without simultaneously relaxing constraints i and j. 
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in parametric form as š 
hi(x; © = hi(x)} + ci. 


Under this assumption, we can define a relaxation of the i-th constraint as a “small” 
increase in the value of the parameter c;. 

Let C c R' be some open set of feasible values for the parameters (c;,..., ci). 
Suppose that for each ¢ € C, there is a global maximum x* (c) of f on the constraint 
set 


D(c) = UN{x ER" | hix) +cq>0,i1=1 


Suppose further that the constraint qualification holds at each ¢ € C, so there exists 
A*(c) > 0 such that 


l 
Df (x*(c)) + JON (C)DAi(x*(c)) = 0. 
i=l 
Finally, suppose that A*(-) and x*(-) are C! functions of the parameters c on the 
set C. 
Define F:C > R by 


Pe) fix" (e)): 


The demonstration will be complete if we show that dF (c)/dcj =Aj,i = 1,..., 4. 
Pick any c € C. Suppose i is such that A; (x*(c)) + c; > 0. Pick any ĉ; such that 
ĉi < cj and 
hi(x*(c)) + > 0. 


Consider the constraint set D(c_;, ĉi) which results when the parameter c; in con- 
straint i is replaced by ĉ;. Since ¢; < c;, we must have 


Dle-i ĉi) C Dc). 


Since x* = x*(c) is a local maximum of f on the larger constraint set D(c), and 
since x*(c) € D(c_j;, ĉi) by choice of ĉ;, it follows that x*(c) is also a maximum of 
f on the constraint set D(c_;, ĉi). Therefore, 


Fle- ê) = Sc) = Fe). 


It follows immediately that 3 F(c)/ðc; = 0. On the other hand, it is also the case 
that hj(x*(c)) + c; > 0 implies by the Kuhn—Tucker complementary slackness 
conditions that A*(c) = 0. Therefore, we have shown that if constraint i is slack at 
c, we must have 

OF 


Ie = Aj(e). 
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It remains to be shown that this relationship also holds if í is an effective constraint 
at c, i.e., if constraint i holds with equality. This may be achieved by adapting the 
argument used in subsection 5.2.3 of Chapter 5. The details are left to the reader. 


6.2 Using the Theorem of Kuhn and Tucker 
6.2.1 A “Cookbook” Procedure 


The cookbook procedure for using the Theorem of Kuhn and Tucker to solve an 
inequality-constrained optimization problem involves essentially the same steps as 
those in using the Theorem of Lagrange in solving equality-constrained problems: 
namely, we form a “Lagrangean” L, then we compute its critical points, and finally, 
we evaluate the objective at each critical point and select the point at which the 
objective is optimized. 

There are, however, some important differences in the details. One of these arises 
from the fact that the conclusions of the Kuhn—Tucker Theorem differ for local max- 
ima and local minima. This necessiates a difference in the steps to be followed in 
solving maximization problems, from those to be used in solving minimization prob- 
lems. These differences are minor; nonetheless, for expositional ease, we postpone 
discussion of the minimization problem to the end of this subsection, and focus on 
maximization problems of the form 


Maximize f(x) subject tox € D = UNM {x | h(x) = 0}. 


As the first step in the procedure for solving this problem, we form a function 
L:IR" x R! —> R, which we shall continue calling the Lagrangean, defined by: 


I 
L(x, A) = fe) +} hihi(x). 
i=l 


The second step in the procedure is to find all solutions (x, A) to the following set 
of equations: 
ðL i 
(x,a) = 0, j= l,...,a, 
Ox; 
a ATEA ài = 0 he ej GS l 
Bhi E a E ERIA i i A 
Any solution to this system of equations will be called a “critical point” of L. It is 
important to note that the equations that define the critical points of L differ from 
the corresponding ones in equality-constrained problems, in particular with respect 
to the A-derivatives. Let M denote the set of all critical points of L for which x € U: 


M = {(x,A) | (x, A) is a critical point of L and x € U}. 
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ND ee 


As the third and last step, we compute the value of f at each x in the set 
{x | there is À such that (x, A) € M}. 


In practice, the value of x that maximizes f over this set is typically also the solution 
to the original maximization problem. 

The reason this procedure works well in practice, and the conditions that may 
cause its failure, are explored in the following subsections. But first, some notes on 
inequality-constrained minimization problems. Suppose the original problem is to 
solve 


Minimize f(x) subject tox € D= UN {x | h(x) = 0}. 


Two routes are open to us. First, since x minimizes f over D if, and only if, x 

maximizes — f over D, we could simply rewrite the problem as a maximization 

problem with the objective function given by — f, and use the procedure listed above. 
The alternative route is to modify the definition of the Lagrangean to 


{ 
Læ, A) = f(x) — $ Ae), 
i=) 


and follow the same remaining steps as listed for the maximization problem. That is, 
in the second step, we find the set M of all the points (x, à) which satisfy x € U as 
well as 


alc 
——(x,A) = 0, j=l,...,n, 
Ox; 


aL 

OX; 
Lastiy, we evaluate f at each point x in the set {x | there is A such that (x, A) € M}. 
The vaiue of x that minimizes / over this set is typically also a global minimum of 
the origina! problem. 


al 
(x,A) > 0, A; => 0, Xk, à) = 0, F=1,...,1. 
Oi 


6.2.2 Why the Procedure Usually Works 


As earlier with the Theorem of Lagrange, it is not very hard to see why this method 
is usually successful. We discuss the reasons in the context of inequality-constrained 
maximization problems here. With the appropriate modifications, the same arguments 
also hold for minimization problems. 

The key, once again, lies in a property of the Lagrangean L: 
The set of critical points of L contains the set of all local maxima of f on D at which the 


constraint qualification is met. That is, if x is a local maximum of f on D, and the constraint 
qualification is satisfied at x, then there must exist A such that (x, A) is a critical point of L. 
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As earlier, this is an immediate consequence of the definition of Z and its critical 
points. Since we have 


aL 
any = hj(x) 


OL oh; 
FN = La Pron, 
xj 


a pair (x, A) can be a critical point of Z if and only if it satisfies the following 
conditions: 


hy (x) > 0, 4; > 0, AjAi(x) = 0, f= 1,...,0. 


l 
Df(x) +J ài Dhi(x) = 
i=l 
Equivalently, (x, à) is a critical point of L if and only if it satisfies the Kuhn—Tucker 
first-order conditions for a maximum, i.e., it satisfies A(x) > O as well as conditions 
(KT-1] and [KT-2] of Theorem 6.1. 

Now, suppose x* is a local maximum of f on D, and the constraint qualification 
holds at x*. Since x* is feasible, it must satisfy k(x*) > 0. By the Theorem of Kuhn 
and Tucker, there must also exist A* such that (x*, A*) satisfies conditions [KT-1} 
and [KT-2] of Theorem 6.1. This says precisely that (x*, A*) must be a critical point 
of L, establishing the claimed property. 

A special case of this property is: 


Proposition 6.5 Suppose the following conditions hold: 


1. A global maximum x* exists to the given inequality-constrained problem. 
2. The constraint qualification is met at x*. 


Then, there exists * such that (x*, X*) is a critical point of L. 


It follows that, under the conditions of Proposition 6.5, the procedure we have 
outlined above will succeed in identifying the maximum x*. Since neither the exis- 
tence of solutions nor the constraint qualification is usually a problem in applications, 
Proposition 6.5 also provides an indirect explanation of why this procedure is quite 
successful in practice. 


6.2.3 When It Could Fail 


Unfortunately, the failure of either condition of Proposition 6.5 could also lead to 
failure of this procedure in identifying global optima. This subsection provides a 
number of examples to illustrate this point. 
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First, even if an optimum does exist, the constraint qualification may fail at the- 
optimum, and as a consequence, the optimum may not turn up as part of a solution to 
the equations defining the critical points of L. It is very important to understand that 
this does not imply that L will possess no critical points. In each of Examples 6.6 
and 6.7, a unique global maximum exists to the stated problem, and in each case 
the constraint qualification fails at the optimum. In Example 6.6, this results in a 
situation where L fails to have any critica) points. This is not, however, the case 
in Example 6.7, where L possesses multiple critical points, although the problem’s 
unique maximum is not amongst these. 


Example 6.6 As in Example 6.4, let f and A be C? functions on R? defined by 
f(x, y) = —(x? + y?), and A(x, y) = (x — 1)? — y?, respectively. We have seen in 
Example 6.4 that the unique global maximum of f on Dis achieved at (x, y) = (1,0), 
but that the constraint qualification fails at this point; and that, as a consequence, there 
is no A € R4 such that (1, 0, A) meets the conclusions of the Theorem of Kuhn and 
Tucker. Evidently, then, there is no value of à for which the point (1, 0, A) arises as 
a critical point of L(x, y) = f(x, y) + A(x, y), and the cookbook procedure fails 
to identify this unique global optimum. 

Indeed, there are no solutions to the equations that define the critical points of. L 
in this problem. These equations are given by: 


—2x +3A(x -1)* = 
—2y-2rAy = 0 
ge I= 97 >20, à >20, Me ap -y = 


If y # 0, then the second equation implies A = —1, which violates the third equation. 
So we must have y = 0. If à is also zero, then from the first equation, we have x = 0, 
butx = y = 0 violates (x — 1} -— y? > 0. On the other hand, if à > 0, then—since 
y = 0—the complementary slackness condition implies (x — 1)? = 0, or x = 1, but 
this violates the first equation. g 


Example 6.7 Let f and g be functions on R defined by f(x) = 2x? — 3x? and 
g(x) = (3 — x)’, respectively. Consider the problem of maximizing f over the set 

= (x | g(x) > 0). 

Since (3 — xp > 0 if and only if 3 — x > O, the constraint set is ee the 
interval (—oo, 3}. A simple calculation shows that f is nonpositive for x < 3, and 
is strictly positive and strictly increasing for x > 3. Therefore, the unique global 
maximum of f on D occurs at the point x* = 3. At this point, however, we have 
g'(x*) = -33 - x*)? = 0, so the constraint qualification fails. We will show that, 
as a consequence, the procedure we have outlined will fail to identify x’. 
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Form the Lagrangean L(x, A) = f(x) + Ag(x). The critical points of L are the 
solutions to 


6x? — 6x — 3A (3 — x)? =0, 
420, G-xy>>0, AB—x)> =0. 


For the complementary slackness condition to hold, we must have either A = 0 or 
x = 3. A simple calculation shows that if A = 0, there are precisely two solutions to 
these equations, namely (x, à) = (0, 0) and (x, A) = (1, 0). As we have seen, neither 
x = Onorx = lisa global maximum of the problem. On the other hand, if x = 3, the 
first equation cannot be satisfied, since 6x? — 6x —3A(3 —x)? = 6x? —6x = 36 # 0. 
So the unique global maximum x* is not part of a solution to the critical points of L. 

(m) 


Alternatively, even if the constraint qualification holds everywhere on the feasible 
set, the procedure may still fail to identify global optima, because global optima may 
simply not exist. In this case, the equations that define the critical paints of L may 
have no solutions; alternatively, there may exist solutions to these equations which 
are not global, or maybe even local, optima. Consider the following examples: 


Example 6.8 Let f and g be C! functions on R defined by f(x) = x? — x, and 
g(x) = x, respectively. Consider the problem of maximizing f over the set D = 
{x | g(x) > 0}. Note that since g'(x) = 1 everywhere, the constraint qualification 
holds everywhere on D. 

Define L(x, A) = f(x) + Ag(x). The critical points of L are the solutions (x, A) 
to 


This system of equations admits two solutions: (x, A) = (0, 1), and (x, A) = G, 0). 
However, neither point is a solution to the given maximization problem: for instance, 
at x = 2 (which is a feasible point), we have f(x) = 3, while f(0) = 0 and 
f G) = -4. Indeed, the given problem has no solution at all, since the feasible set 
is all of Ry, and f(x) t œoasx fî œœ. o 


Example 6.9 Let f and g be functions on R? defined by f(x, y) = x + y and 
g(x, y) = xy — 1, respectively. Consider the problem of maximizing f over the 
feasible set 


D = {(x, y) e R? | g(x, y) > 0}. 
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This problem evidently has no solution: for any real number x > 1, the vector (x.x) 
is in D, and f(x,x) = 2x, which is increasing and unbounded in x. We will show 
that, nonetheless, a unique solution exists to the critical points of the Lagrangean in 
this problem. 

The Lagrangean L has the form L(x, y, à) = x + y+ A(xy — 1). The critical 
points of L are the solutions (x, y, A) to 


1+Ay=0 
l +àr=0 
à >20, xyz l, A(l-xy)=0. 


A simple calculation shows that the vector (x, y, A) = (—1, —1, 1) satisfies all these 
conditions (and is, in fact, the unique solution to these equations). As we have seen, 
this point does not define a solution to the given problem. J 


These examples suggest that caution should be employed in using the cookbook 
technique in solving inequality-constrained optimization problems. If one can verify 
existence of global optima and the constraint qualification a priori, then the method 
works well in identifying the optima. On the other hand, in situations where answers 
to these questions are not available a priori, problems arise. 

First, the Lagrangean L may fail to have any critical points for two very different 
reasons. On the one hand, this may be because the problem itself docs not have 
a solution (Example 6.8). On the other hand, this situation can arise even when a 
solution does exist, since the constraint qualification may be violated at the optimum 
(witness Example 6.6). Thus, the absence of critical points of L does not enable us to 
draw any conclusions about the existence or non-existence of solutions to the given 
problem. 

Second, even if the Lagrangean L has one or more critical points, this set of critical 
points need not contain the solution. Once again, this may be because no solution 
exists to the problem (as in Example 6.9), or because a solution does exist, but one 
at which the constraint qualification is violated (cf. Example 6.7). Thus, even the 
presence of critical points of L does not enable us to draw conclusions about the 
existence of solutions to the given problem. 


6.2.4 A Numerical Example 


Let g(x, y) = 1 - x? — y?. Consider the problem of maximizing f(x, y) = x? — y 
over the set 


D = (x,y) | g(x, y) = O}. 
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We will first argue that both conditions of Proposition 6.5 are satisfied, so 
the cookbook procedure must succeed in identifying the optimum. The feasible 
set D in this problem is just the closed unit disk in R?, which is evidently 
compact. Since the objective function is continuous on D, a maximum exists by 
the Weierstrass Theorem, so the first of the two conditions of Proposition 6.5 is 
nalintieds, “Ta nee that the second condition is also met, note that at any point where 
the problem’s single constraint is effective (that is, at any (x, y) where we have 
x*4y? = 1), we must have either x 4 0 or y Æ 0. Since Dg(x, y) = (—2x, -2y) 
at all (x, y), it follows that at all points where g is effective, we must have 
p(De(x, y)) = 1. Thus, the constraint qualification holds if the optimum occurs 
at a point (x, y) where g(x, y) = 0. If the optimum occurs at a point (x, y) where 
g(x, y) > 0, then the set of effective constraints is empty. Since the constraint 
qualification pertains only to effective constraints, it holds vacuously in this case 
also. Thus, the second of the two conditions of Proposition 6.5 is also met. 

Now set up the Lagrangean L(x, y, à) = x? y+- xX- y?). The critical 
points of Z are the solutions (x, y, A) to 


2x —2Ax = 0 
—-1—2Ay = 0 


LS dx?) a 0) Aer a) = O. 


For the first equation to hold, we must have x = Oor A = 1.1f A = 1, then from 

the second equation, we must have y = -}, while from the third equation, we must 

have x? + y? = 1. This gives us two critical points of L, which differ only in the 
value of x: 

v3 ıl 

x,y,A) = | t—-,--=,1]. 

(x, ¥, A) ( 7'737 


Note that at either of these critical points, we have f(x, y) = i +45= 3. 

This leaves the case x = 0. If we also have A = 0, then the second equation 
cannot be satisfied, so we must have A > 0. This implies from the third equation that 
aw? = h so p = +l. Since y = I is inconsistent with a positive value for À 
from the second equation, the only possible critical point in this case is 


1 
(x, y, A) = (0,-1, 3) 3 


At this critical point, we have f(0, —1) = 1 < 3, which means this point cannot be a 
solution tothe original maximization problem. Since there are no other critical points, 
and we know that any global maximum of f on D must arise as part of a critical 
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point of L, it follows that there“are exactly two solutions to the given optimizafiorni™ 
problem, namely the points (x, y) = (./3/4, —1/2) and (x, y) = (~/3/4, -1/2). 
m 


6.3 Illustrations from Economics 


In this section, we present two ¢xamples drawn from economics, which illustrate the 
use of the procedure outlined in the previous section in finding solutions to inequality- 
constrained optimization problems. The first example, presented in subsection 6.3.1, 
considers a utility maximization problem. The second example, presented in subsec- 
‘ion 6.3.2, describes a cost minimization problem. 

The presentation in this section serves a twofold purpose: 


1. It explains the process of checking for the constraint qualification condition in 
the presence of multiple inequality constraints. Unlike the case with equality- 
constrained problems where this process is relatively straightforward, there is 
the additional complication here that the constraint qualification depends on the 
precise subset of constraints that are effective at the unknown optimal point. 
Thus, the only way, in general, to check that the constraint qualification will 
hold at the optimal point is to take all possible locations of the optimal point, 
and demonstrate that the condition will hold at each of these locations.? 

2. It details a method by which the critical points of L may be identified from the 
equations that define these points. This process is again more complicated than 
the corresponding situation in equality-constrained optimization problems, since 
a part of the equations (namely, the complementary slackness conditions) are not 
specified in the form of inequalities. 


As a practical matter, these examples make an important point: that, whenever this 
is possible, it is better to solve an inequality-constrained optimization problem by 
reducing it to an equality-constrained one, since calculating the critical points of the 
Lagrangean in the latter class of problems is a significantly easier task. 

Some comments on the examples themselves are also important. The two exam- 
ples presented in this section share some common features. Most notably, in cach 
case the formulations considered are such that the problem cannot be reduced to an 
equality-constrained problem. However, they are also designed to illustrate differ- 
ent aspects of solving inequality-constrained optimization problems. For instance, 
the set of critical points of the Lagrangean in the first example is very sensitive to 
the relationship between the parameters of the problem, whereas this dependence is 
3“All possible locations of the optimal point” does not mean the entire feasible set, since it may be possible 


to use the structure of the specific problem at hand to rule out certain portions of the feasible set. See the 
examples that follow in this section. 
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much less pronounced in the second example. On the other hand, every critical point 
in the first example also identifies a solution of the problem; in contrast, the second 
example always has at least one critical point that is zot a solution. 


6.3.1 An Illustration from Consumer Theory 


In this subsection, we consider the problem of maximizing the utility function 
u(x1, x2) = x, + x2 on the budget set 


Bp, D = {(x1, x2) | — pixi — pox2 = 0, xı = 0, x2 > 0}, 


where J, pı, and p? are all strictly positive terms. There are three inequality con- 
straints that define this problem: 


h(x x2) = x; > 0 


ha(xi, x2) = x2 20 
h3(x1, x2) = I — pix, — pox2 = 0. 


As we have seen in Subsection 5.5.3, the nonnegativity constraints 4; and A2 cannot 
be ignored, so this problem cannot be reduced to an equality-constrained one. We 
will show that it can be solved through the procedure outlined in Section 6.2. 

We begin by showing that both conditions of Proposition 6.5 are met. The budget 
set B(p, I) is evidently compact, since prices and income are all strictly positive, and 
the utility function is continuous on this set. An appeal to the Weierstrass Theorem 
yields the existence of a maximum in this problem, so one of the two requirements 
of Proposition 6.5 is satisfied. 

To check that the other requirement is also met, we first identify all possible 
combinations of constraints that can, in principle, be effective at the optimum. Since 
there are three inequality constraints, there are a total of eight different combinations 
to be checked: namely, Ø, hi, h2, h3, (Ay, h2), (hi, h3), (h2, h3), and (h1, h2, h3). 
Of these, the last can be ruled out, since hy = h2 = 0 implies h3 > 0. Moreover, 
since the utility function is strictly increasing in both arguments, it is obvious that 
all available income must be used up at the optimal point, so we must have 43 = 0. 
Therefore, there are only three possible values for the set A g of effective constraints 
at the optimum, namely, h £ = (hy, 43), he = (h2, 23), and hg = h3. We will show 
that the constraint qualification holds in each of these cases. 

If the optimum occurs at a point where only the first and third constraints are 
effective and A g = (hj, k3), we have 


1 0 
Dhe(x1, x2) = | | 
=pi -p2 
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atany (x1, x2). Since pı and p2 are'strictly positive by hypothesis, this matrix has full _ 
rank, so the constraint qualification will hold at such a point. A similar computation 
shows that if the optimum occurs at a point where only the second and third constraints 
are effective, the constraint qualification will be met. Finally, if the third constraint 
is the only effective one at the optimum and hg = h3, we have Dhe(x;, x2) = 
(—pt, — p2), and once again the positivity of the prices implies that there are no 
problems here. 

In summary, the optimum in this problem must occur at a point where h3 = 0, but 
no matter where on this set the optimum lies (i.e., no matter which other constraints 
are effective at the optimum), the constraint qualification must hold. Therefore, the 
second condition of Proposition 6.5 is also satisfied, and the critical points of the 
Lagrangean must contain the global maxima of the problem. 

The Lagrangean for this problem is: 


L(x, A) = xy + x2 + AL) + à2x2 + ARC — pix, — p2x2). 


The critical points of the Lagrangean are the solutions (x4, x2, 41,2, A3) to the 
following system of equations: 


l. 1+A,—A3pi = 0. 
~2. L4A2—-Azp2 = 0. 

3. Ay > 0, xy 0, Ayx; = 0. 
4.42 > 0, x2 > 0, àx = 0. 
5003. 2 0, I= pua = pre È 0, Da = pixi = p232) = Q. 


To solve for all the critical points of this system, we adopt the following procedure. 
We fix a subset C of (A, h2, h3}, and examine if there are any critical points in which 
only the constraints in the set C hoid with equality. Then, we vary C over all possible 
subsets of {h1, h2, h3}, and thereby obtain all the critical points. 

In general, this procedure would be somewhat lengthy, since there are 8 possible 
values for C: namely, Ø, {h1}, {42}, {h3}, (Ay, h2}, (h1. h3}, (h2, h3} and (hy, A2, h3}. 
However, as we have already mentioned, the optimum in this problem must occur ata 
point where 43 = 0, soit suffices to find the set of critical points of L at which hy = 0. 
Moreover, as also mentioned, the case C = {hy}, h2, 43} has no solutions because 
hy = h2 = O implies A3 > 0. Thus, there are just three cases to be considered: {A3}, 
{h2, h3}, and {h,, 43}. We examine each of these cases in tum. 


Case 1: C = {h3} 
Since only constraint 3 is assumed to hold with equality in this case, we must have 
xı > 0 and x2 > 0. By the complementary slackness conditions, this implies we 
must also have A; = Az = O. Substituting this into the first two equations that define 
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the critical points of L, we obtain 
ì3pi = Asp2 = 1. 


These conditions make sense only if p) = p2. Thus, we retain this case as a possible 
solution only if the parameters happen to satisfy p; = p2. Otherwise, it is discarded. 

In the event that pı and pz are equal to a common value p > 0, we must evi- 
dently have 43 = 1/p. It is now easily seen that, when p; = p2 = p, there are 
infinitely many critical points of L that satisfy x, > 0 and x2 > 0. Indeed, any point 
(x1, x2, A), A2, A3) is a critical point provided 


Ay = Az = 0, A3 = l/p, x; € (0, //p), and x2 == (T — px))/p. 
Note that at all these critical points, we have u(x), x2) = //p. 


Case 2: C = {A2, h3} 
Since h2 = h3 = O translates to x2 = O and J — pix; — p2x2 = 0, we must have 
xı = //p > 0 in this case. Therefore, by the complementary slackness condition, 
we must have 4; = 0. Of course, we must also have A2 > 0. Substituting these into 
the first two equations that define the critical points of L, we obtain 
A3pi = 1 < 1+A2 = A3p2. 


Therefore, such a critical point can exist only if pi < p2. Assuming this inequality 
to be true, it is seen that the unique critical point of Z at which A2 = h3 = 0 is 


I 
(x1, X2, Als A, A3) = (=.0.0 ae LY —) . 
Pı 
The value of the objective function at this critical point is given by u(x1, x2) = // pı. 


Case 3: C = (hi, h3} 


This is similar to Case 2. It is possible as a critical point of L only if py > p2. The 
unique critical point of Z in this case is given by: 


I l 
(2132, dada) = (0, —, 2 = 1,0, 5). 
` P2 p2 p2 
‘The value of the objective function at this critical point is given by u (x1, x2) = 1/ po. 


Summing up, we have the following: 


e If pi > po, there is exactly one critical point of L (namely, that arising in Case 3), 
whose associated x-values are (x1, x2) = (0, //p2). 

ə If pı < p2, L has only a single critical point (namely, that arising in Case 2), and 
this has the associated x-values (x1, x2) = (I/ pı, 0). 
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e If pi = p = p, L has infinitely many critical points (viz., all of the cntical points- 
arising in each of the three cases). The set of x-values that arises in this case is 
(@1, x2) | x1 +x2 = I/P, xı 20, x2 = 0}. 


We have already shown that for any pı > O and p2 > O, the critical points 
of L must contain the solution of the problem. Since L has a unique critical point 
when p; > p2, it follows that this critical point also identifies the problem’s global 
maximum for this parameter configuration; that is, the unique solution to the problem 
when pı > p2is (x1, x2) = (0, // p2). Similarly, the unique solution to the problem 
when pi < p2 is (x1, x2) = (//p),0). Finally, when pi = p2 = p, there are 
infinitely many critical points of L, but at all of these we have u(x, x2) = //p. 
Therefore, each of these values of (x1, x2) defines a global maximum of the problem 
in this case. 


6.3.2 An Illustration from Producer Theory 


The problem we consider in this section is the one of minimizing wy x, + w2x2 over 
the feasible set 


D = {(x1, x2) E RÈ |x? +23 > y}. 


The constraint set of this problem is defined through three inequality constraints, 
namely 


hy (x1,%2) = xy 0 


IV 


h(x, x2) = x2 2 0 


IV 


h3(x1, x2) = x? +x? -y > 0. 


As usual, we shall assume that all the parameters of the problem—namely, wy, w2, 
and y—are strictly positive. 

Once again, we begin our analysis with a demonstration that both conditions of 
Proposition 6.5 are satisfied. The existence of solutions may be demonstrated in many 
ways, for instance, by compactifying the feasible action set in the manner described 
in Example 3.8. The details are omitted. 

To check that the constraint qualification will hold at the optimum, we first identify 
all possible combinations of the constraints that can, in principle, hold with equality 
at the optimum. Since there are three constraints, there are eight cases to be checked: 
Ø, hy, ho, h3, (hi, h2), (hi, 43), (h2, h3), and (hy, h2, h3). Of these, the last can 
be ruled out since Ay = h2 = © implies x? + x3 = Q0, whereas the constraint set 
requires x? +x} > y. Itis also apparent that, since w; and w7 are strictly positive, we 
must have h3 = 0 at the optimum (i.e., total production x; + x2 must exactly equal 
y), or costs could be reduced by reducing output. This means there are only three 
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possible descriptions of the set A g of effective constraints at the optimum: 4 g = h3, 
hg = (hy, h3), and hg = (h2, h3). We will show that in each case, the constraint 
qualification must hold. 

First, consider the case hg = (hj, h3). Since hı and h3 are effective, we have 
x, = O and x? + x? = y, S0 x2 = ,/y. Therefore, 


1 0 1 0 
Dhg(xi, x2) = BE A = E sal 


Since this matrix evidently has the required rank, it follows that if the optimum 
occurs at a point where h; and A3 are the only effective constraints, the constraint 
qualification will be met. 

An identical argument, with the obvious changes, shows that the constraint quali- 
fication is not a problem if the optimum happens to occur at a point where h2 and hy 
are effective. This leaves the case A g = h3. In this case, we have 


Dh g(x1,X%2) = (2x1, 2x2). 


Since we are assuming that only 43 holds with equality, we must Fave x1, x2 > 0, 
so p(DheE(x1, x2)) = |E| = 1 as required. 

Summing up, the optimum must occur at a point where 43 = 0, but no matter 
where on this set the optimum occurs, the constraint qualification will be met. It 
follows that the set of critical points of the Lagrangean must contain the solution(s) 
of the problem. 

The Lagrangean L in this problem has the form 


L(x1, X2, 1, à2, à3) = —wyxy — wax2 +A px) + à2x2 + àa (x? + x2 — y). 


Note that we have implicitly set the problem up as a maximization problem, with 
objective —w1 xı — w2x2. The critical points of L are the solutions (x1, x2, à1, 42, 43) 
to the following system of equations: 


—wy +À; + 2A3x1 = 0. 

—w2 +A2+2A3x2 = 0. 

Ay > 0, x, = 0, àx = 0. 

A2 > 0, x2 = 0, à2x2 = 0. 

A3 > 0, x? +x -y > 0, A3(x? + x3 — y) = 0. 


v 


wm A WwW N — 


As in subsection 6.3.1, we fix a subset C of {h1, 42, h3} and find the set of all pos- 
sible solutions to these equations when only the constraints in C hold with equality. 
As C ranges over all possible subsets of {41, 42, h3), we obtain the set of all critical 
points of L. 
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Once again, this process is simplified by the fact that we do not have to consider all 
possible subsets C. First, as we have mentioned, h3 must be effective at an optimum, 
so it suffices to find all critical points of L at which h3 = 0. Second, the case 
C = {h1, h2, h3} is ruled out, since Ay = h2 = O violates the third constraint that 
h3 > 0. This results in three possible values for C, namely, C = {h3}, C = (ho. hal, 
and C = {h,,h3}. We consider each of these in tum. 


Case 1: C = {h3} 
Since we must have x; > 0 and x2 > 0 in this case, the complementary slackness 
conditions imply A; = Az = 0. Substituting these values into the first two equations 
that define the critical points of L, we obtain 
2A3X; = Wy 
2A3x2 = w2. 


Dividing the first equation by the second, this implies 


(a) 
x = (— ) x2. 
w2 


By hypothesis, we also have h3 = O or x? + x? = y. Therefore, we have 


1/2 1 
wy \” wy \'? w? + w3\'? 
a eee RA a E A ae a 
wi t wy wi + wy y 


Combined with à} = A2 = 0, this represents the unique critical point of L in 
which 4; > 0, h2 > 0, and h3 = 0. Note that the value of the objective function 
—W |X| — W2X2 at this critical point is 


WX} —w2x2 = —(wi + wR) y2, 


Case 2: C = {h2, h3} 
In this case, we have x; > O (and, in fact, x} = \/y, since hy = h3 = O implies 
x2 = Oand x? + x — y = 0). Therefore, we must also have A; = 0. Substituting 
xı = ./y and à; = 0 into the first of the equations that define the critical points of 
L, we obtain 
23x — wi = 23 Jy- wi = 0, 


or A3 = w1/2//7. Finally, substituting x2 = 0 into the second equation defining 


the critical points of L, we get Az = w2. Thus, the unique critical point of L which 
satisfies h2 = h3 = Q is 


(x1, X2, Àl, à2, à3) = (JY, 0,0, w2, w1/./y). 


The value of the objective function (— wx — w2x2) at this critical point is —w, y!/*. 
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Case 3: C = {hy, h3} 


This is the same as Case 2, with the obvious changes. The unique critical point of L 
that falls in this case is 


(x1, 2,41, 42,43) = (0, VY, wi, 0, w2/./Y), 


and the value of the objective function (—w 1x) — w2xz) at this critical point is 
1/2 
“wy. 


Summing up, L has three crìtical points, and the values taken by the objective 
function at these three points are -(w? + wa)? y2, —W yl, and ~w2 y!/?. Now, 
we have already established that the set of critical points of Z in this problem contains 
the solution(s) to the problem; and, therefore, that the point(s) that maximize the value 
of the objective function among the set of critical points must be the solution(s) to 
the problem. All that remains now is to compare the value of the objective function 
at the three critical points. 

Since w; > 0 and w2 > 0, it is always the case that (w? + w4)!” > wi and, 
therefore, that 


(w? + w)? yl? < =w. 


We may, as a consequence, ignore the first value of the objective function, and the 
point that it represents. Comparing the value of the objective function at the remaining 
two points, it can be seen that 


e When w; < wz, then the larger of the two values is ~w yt/?, which arises at 
(x1, x2) = (y'/2, 0). Therefore, the problem has a unique solution when w; < w7, 
namely QL, 0). 

e When w; < wz, then the larger of the two values is —w; y'/?, which arises at 
(x1, x2) = (0, y!/?). Therefore, the problem has a unique solution when w; > wz, 
namely (0, y!/2), 

ə When w and w? have a common value w > 0, then these two values of the 
objective function coincide. Therefore, if w; = wz, the problem has two solutions, 
namely (y!/?, 0) and (0, y!/?). 


6.4 The General Case: Mixed Constraints 


A constrained optimization problem with mixed constraints is a problem where the 
constraint set D has the form 


D = UN{x ER" | g(x) =0, A(x) > 0), 
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where U C R” is open, g: R” — WR, and 4: IR” — R'. For notational case, define 
gi: R” — RtH, where 
Zi, ifie {1,...,k} 
Pi = hike ifielk+l, k+l). 


The following theorem is a simple consequence of combining the Theorem of La- 
grange with the Theorem of Kuhn and Tucker: 


Theorem 6.10 Let f: R" > R, and pi: R" > R, i =1,..., [+k be C! functions. 
Suppose x* maximizes f on 


D = UN{x eR" | g(x) =0,1=1,....4, oO 20, fHkFl..., k+l), 


where U C R" isopen. Le EC {l,..., k+l) denote the set of effective constraints at 
x*,andletyg = (iic £. Suppose p((Dee(x*)) = |E|. Then, there exists i € RHA 
such that 


Ll. Ay = Oand drAjpj(x") =0 for je{k+1,..., k+l). 
2. Dix) + TH! hi Dgi(x*) = 0. 


i=l 


6.5 A Proof of the Theorem of Kuhn and Tucker 
Let x* be a local maximum of f on the set 


D = UN{x ER" | A(x) > 0}, 


where h = (h1, ..., hi) is aC! function from R” to R!, and U c R” is open. Let Ẹ 
be the set of effective constraints at x*, and suppose that p(Dh ¢(x*)) = (Ej, where 
he = (hi)ieg. We are to show that there is A € R’ such that 


1. A; > 0, and AjAj(x*) = O,§ = 1,...,1. 
2. Df (x*) + El Ai Dhi(x*) = 0. 

With the important exception of the nonnegativity of the vector A, the Theorem of 
Kuhn and Tucker can be derived as a consequence of the Theorem of Lagrange. To 
simplify notation in the proof, we will denote | E[ by k; we will also assume that the 
effective constraints at x* are the first k constraints: 

hi(x*) =0, i=1,...,k 
hi*) > 0, i=k+1,...1. 


There is, of course, no loss of generality in this assumption, since this may be achieved 
simply by renumbering constraints. 
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For eachi € {1,...,/}, define 
V; = {x e R” 1A; (x) > 0}. 


Let V = Me «4.1 Vi- By the continuity of hj, Vj is open for each i, and so, therefore, 
is V. Now, let D* C D be the equality-constrained set defined through k equality 


constraints and given by 
D* = UNV {x ER" J Aj(x) =0, i= 1,...,k}. 


By construction, we have x* € D*. Since x* is a local maximum of f on D, it 
is certainly a local maximum of f on D*. Moreover, we have p(Dhg(x")) = k, 
by hypothesis. Therefore, by the Theorem of Lagrange, there exists a vector y = 
(101, ...5 tk) € RÝ such that 


k 
Df(x*) + Yo piDhj(x*) = 0. 
i=] 


Now define A € R! by 


0, f=k+1,...,0. 


We will show that the vector A satisfies the properties stated in the theorem. 
First, observe that for i = k + 1,...,/, we have A; = O, and, therefore, 
A; Dh; (x*) = 0. Therefore, 


tl 


l k 

Df (x*) + $ ài Dhi(x*) = Df(x*) + $ Ai Dhi”) 
i=} i=l 
k 

Df (x*) + JO i Dhy(x") 


i=l 


li 


= 0, 


which establishes one of the desired properties. 

Now, for any į € {1,..., k}, we have h;(x*) = 0, so certainly it is the case that 
Ajhj(x*) = Ofori € {1,..., 4}. Fori € {k +1,...,[}, we have A; = 0, so itis also 
the case that AjAj(x*) = 0 fori € {k + 1,...,/}. Summing up, we have as required 


Ayhi(x*) = 0, fi=1,...,0. 


It remains to be shown that A > 0. Since A; = O fori =k + 1,..., l, we are 
required only to show that A; > Ofori = 1,...,k. We will establish that A; > 0. A 
similar argument will establish that Aj; > Ofori =2,...,k. 
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To this end, define for x € R” andn € R, the function H = (H1, ..., Hy) from 
R”+! to RÉ by 
Hi(x. n) = Aye) — 0 
Ay(x,n) = hi(x), i =2,..., k. 
Let DH, (x, n) denote the k x n derivative matrix of H with respect to the x variables 
alone, and DH,(x, 7) denote the k x 1 derivative vector of H with respect to n. For 
future reference, we note that since hg = (hi, ..., hk), we have DH,(x,n) = 
Dh p(x), we also have DH,(x,n) = (—1,0,..., 0) at any (x, n). 
By the definition of H, we have H(x*,0) = 0. Moreover, p(D/,(x*,0)) = 


p(Dh g(x")) = k. Therefore, by the Implicit Function Theorem (see Chapter 1, 
_ Theorem 1.77), there is a neighborhood N of zero, and a C! function £: N > R”, 
such that £(0) = x*, and 


H(&(m),n) = 0, nen. 


Differentiating this expression using the chain rule, and evaluating at (0) = x*, we 
obtain 


DH,(x*,0)D§(0) + DH, (x*, 0) = 0. 


Since DH,(x,n) = Dhe(x) and DH, (x, n) = (—1,0,...,0) at any (x, n), this 
implies in turn 


Dhg(x*)DE(O) = (1,0,...,0). 
or, equivalently, that 

Dhy(x*) DE(O) = 1 

Dh;(x*) DEO) = 0, i = 2,...,k. 
Since A; = Ofori =k+1,...,/, we now have 


I 
Df(x*)DE(0) = — (Lunan) D (0) 


i=l 


k 
— o Ài prano) 


i=l 


= =À]. 
To complete the proof, we will show that 
Df(x*)DE(0) < 0. 


To this end, we first show that there is n* > O such that for all n € {[0, n*), we 
must have (7) € D, where D is the constraint set of the original problem. 


168 Chapter 6 Inequality-Constrained Optimization 
If n > 0, then H;(E(n)) = 0 fori = 1,...,4, and from the definition of the 
functions 7, this means 


hilm) = n > 0, 


and 
hy(E(y)) = 0, 7 =2,...,k. 
Fori = k+1,...,/, we have h;(&(0)) = Aj(x*) > 0. Since both h; (-) and &(-) are 


continuous, it follows that by choosing n sufficiently small (say, n € (0, n*)), we can 
also ensure that 


hin) > 0, i=k41,...,0. 
Finally, by shrinking the value of n* if need be, we can evidently ensure that £ (n) € U 
for all n € [0, n*). Thus, there is n* > 0 such that €(n) € D for n e [0, n*), as 
claimed. 
Now, since §(0) = x* is a local maximum of f on D, and &(n) is in the feasible 


set for n € [0, n*), it follows that for 7 > 0 and sufficiently close to zero, we must 
have 


fx") = Em). 
Therefore, 
(ae ea) 29 
n — + 
for all n > 0 and sufficiently small. Taking limits as n | 0, we obtain 
Df(§(0))DE(O) < 0, 


or Df(x*) DE (0) < 0. The proof is complete. o 


6.6 Exercises 


1. Solve the following maximization problem: 


Maximize lnx +lIny 


l 
0. 


subjectto x? + y? 
x,y 


2. A firm produces two outputs y and z using a single input x. The set of attainable 
output levels H (x) from an input use of x, is given by 


Iv 


H(x) = {(y,z) € R2 | y? +2? < x). 


The firm has available to it a maximum of one unit of the input x. Letting p, 
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and p; denote the prices of the wo outputs, determine the firm’s optimal output - 
mix. 

. A consumer has income / > 0, and faces a price vector p € R44 for the three 
commodities she consumes. All commodities must be consumed in nonnegative 
amounts. Moreover, she must consume at least two units of commodity 2, and 
cannot consume more than one unit of commodity 1. Assuming / = 4 and 
p = (1, 1, 1), calculate the optimal consumption bundle if the utility function is 
given by u(x), x2, x3) = x1x2x3. What if 7 = 6 and p = (1, 2, 3)? 


. Let T > 1 be some finite integer. Solve the following maximization problem: 


T t 
l 
Maximize y (3) Jxt 


t=] 


T 
subjectto. $` x < 1 
t=] 
Xt > 0, t= j PEPEE AA 


. A firm produces an output y using two inputs xı and x2 as y = ./x}x2. Union 
agreements obligate the firm to use at least one unit of x, in its production process. 
The input prices of x; and x2 are given by w; and wz, respectively. Assume that 
the firm wishes to minimize the cost of producing Y units of output. 


(a) Set up the firm’s cost-minimization problem. Is the feasible set closed? Is it 
compact? 


(b) Does the cookbook Lagrangean procedure identify a solution of this prob- 
lem? Why of why not? 

. A firm manufactures two outputs yı and y2 using two inputs x; and x2. The 

production function f: R > RŻ is given by 


1/2 1/2 1/3 

(yi, y2) = xx) = Gy! zy x,/ ). 
Let p; denote the unit price of y;, and w; that of x;. Describe the firm’s opti- 
mization problem and derive the equations that define the critical points of the 
Lagrangean L. Calculate the solution when pi = p2 = l and wy = w = 2. 


. A consumer with a utility function given by u(x1, x2) = f/x) + xıx2 has an 

income of 100. The unit prices of xı and x2 are 4 and 5, respectively. 

(a) Compute the utility-maximizing commodity bundle, if consumption must 
be nonnegative. 

(b) Now suppose the consumer were offered the following option. By paying a 
lump sum of a to the government, the consumer can obtain coupons which 
will enable him to purchase commodity | at a price of 3. (The price of 
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commodity 2 remains unchanged.) For what values of a will the consumer 
accept this offer? For what values of a is he indifferent between accepting 
the offer and not accepting it? 


An agent allocates the H hours of time available to her between labor (/) and 

leisure (H — /). Her only source of income is from the wages she obtains by 

working. She earns w per hour of labor; thus, if she works] € [0, H] hours, 

her total income is w/. She spends her income on food (f) and entertainment 

(e), which cost p and q per unit respectively. Her utility function is given by 

u( f, e, l) and is increasing in f and e, and is decreasing in l. 

(a) Describe the consumer’s utility maximization problem. 

(b) Describe the equations that define the critical points of the Lagrangean L. 

(c) Assuming H = 16, w = 3, and p = q = 1, find the utility-maximizing 
consumption bundle if u( f, e, D) = f'e!/ — P. 


. An agent who consumes three commodities has a utility function given by 


/ 


1/3 : 
u(x1, X2, x3) = x," + min{x2, x3}. 


Given an income of /, and prices of p1, p2, p3, describe the’consumer’s utility 
maximization problem. Can the Weierstrass and/or Kuhn~Tucker theorems be 
used to obtain and characterize a solution? Why or why not? 


. A firm has contracted with its union to hire at least L* units of labor at a wage rate 


of w; per unit. Any amount of additional labor may be hired at a rate of w2 per 
unit, where w; > w2. Assume that labor is the only input used by the firm in its 
production process, and that the production function is given by f: R4 > R}, 
where f is C! and concave. Given that the output sells at a price of p, describe 
the firm’s maximization problem. Derive the equations that define the critical 
points of the Lagrangean L in this problem. Do the critical points of L identify 
a solution of the problem? 


A firm produces the output y using two inputs x; and x2 in nonnegative quantities 
through the production relationship 


y = gx) = x xt, 

The firm obtains a price of p, > 0 per unit of y that it sells. It has available an 
inventory of K, units of the input x, and K2 units of the input x2. More units of 
Xx, and x2 may be purchased from the market at the unit prices of pı > 0 and 
p2 > O, respectively. Alternatively, the firm can also sell any unused amount of 
its inputs to the market at these prices. 


(a) Describe the firm’s profit-maximization problem, and derive the equations 
that define the critical points of the Lagrangean L. 
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(b) Assuming py = pi = p2-= 1, Kı = 4, and K2 = 0, solve for the firm’s - 
optimal level of output y. 

(©) Assume again that py = pi = p2 = l, but suppose now that the values of 
Kı and K3 are interchanged, i.c., we have K; = Oand K3 = 4. Is the firm's 
new optimal output level different from the old level? Why or why not? 

. A firm produces a single output y using three inputs x1, x2, and x3 in nonnegative 

quantities through the relationship 


y = Bx. x2, x3) = x1 (x2 +33). 
The unit price of y is py > 0, while that of the input x; is w; > 0, i = 1,2,3. 


(a) Describe the firm’s profit-maximization problem, and derive the equations 
that define the critical points of the Lagrangean L in this problem. 

(b) Show that the Lagrangean L has multiple critical points for any choice of 
(Py, W1, W2, W3, W4) € R4,. 

(c) Show that none of these critical points identifies a solution of the profit- 
maximization problem. Can you explain why this is the case? 


7 


Convex Structures in Optimization Theory 


The notion of convexity occupies a central position in the study of optimization 
theory. It encompasses not only the idea of convex sets, but also of concave and 
convex functions (see Section 7.1 for definitions). The attractiveness of convexity for 
optimization theory arises from the fact that when an optimization problem meets 
suitable convexity conditions, the same first-order conditions that we have shown in 
previous chapters to be necessary for local optima, also become sufficient for global 
optima. Indeed, even more is true. When the convexity conditions are tightened to 
what are called strict convexity conditions, we get the additional bonus of uniqueness 
of the solution. 

The importance of such results, especially from a computational standpoint, is 
obvious. Of course, such a marked strengthening of our earlier analysis does not 
come free. As we show in Section 7.2, the assumption of convexity is a strong one. 
A function that is concave or convex must necessarily be continuous everywhere on 
the interior of its domain. It must also possess strong differentiability properties; for 
instance, all directional derivatives of such a function must exist at all points in the 
domain. Finally, an assumption of convexity imposes strong curvature restrictions 
on the underlying function, in the form of properties that must be met by its first- 
and second-derivatives. 

These results indicate that an assumption of convexity is not an innocuous one, 
but, viewed from the narrow standpoint of this book, the restrictive picture they paint 
is perhaps somewhat exaggerated. For one thing, we continue to assume in study- 
ing constrained optimization problems that all the functions involved are (at least) 
continuously differentiable. Continuous differentiability is a much greater degree of 
smoothness than can be obtained from just an assumption of convexity; thus, the 
continuity and differentiability properties of concave and convex functions, which 
appear very strong when viewed in isolation, certainly do not imply any increased 
restrictiveness on this problem’s structure. Although the same cannot be said of the 
curvature implications of convexity, even these are not very significant in economic 
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applications, since they are often justifiable by an appeal to such considerations as 
diminishing marginal utility, or diminishing marginal product. 


7.1 Convexity Defined 


Recall from Chapter 1 that a set D c IR" is called convex if the convex combination 
of any two points in D is itself in D, that is, if for all x and yin D and all A e (0,1), 
it is the case that Ax + (1 — A)y €e D. Building on this definition, we introduce 
in subsection 7.1.1 two classes of functions called concave functions and convex 
functions. 

Concave and convex functions play an important role in the study of maximization 
and minimization problems, respectively. Their significance arises priman}y from the 
fact that in problems with a convex constraint set and a concave objective function, 
the first-order conditions are both necessary and sufficient to identify global maxima; 
while in problems with a convex constraint set and a convex objective function, the 
first-order conditions are necessary and sufficient to identify global minima. ! 

Motivated by this, we will say in the sequel that a maximization problem is a convex 
maximization problem if the constraint set is convex and the objective function is 
concave. A minimization problem will similarly be said to be a convex minimization 
problem if the constraint set is convex, and the objective function is convex. More 
generally, we will say that an optimization problem is a convex optimization problem, 
or has a convex environment, if it is either a convex maximization problem, or a convex 
minimization problem. Thus, our use of the word “convexity” encompasses the entire 
collective of convex sets, and concave and convex functions. 

Following our definition of concave and convex functions in subsection 7.1.1, 
we define the more restricted notions of strictly concave and strictly convex func- 
tions in subsection 7.1.2. In addition to possessing all the desirable properties of 
concave and convex functions, respectively, strictly concave and strictly convex 
functions also possess the remarkable feature that they guarantee uniqueness of 
solutions. That is, a convex maximization problem with a strictly concave objec- 
tive function can have at most one solution, as can a convex minimization problem 
with a strictly convex objective function, In an obvious extension of the termi- 
nology introduced earlier, we will say that a maximization problem is a strictly 
convex maximization problem if it is a convex maximization problem, and the 
objective function is strictly concave. Strictly convex minimization problems are 
similarly defined. 


! As always, some additional regularity conditions may have to be met. 
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Fig. 7.1. The Epigraph and Subgraph 


7.1.1 Concave and Convex Functions 
Let f:D C R” — R. Through the rest of this chapter, it will be ‘assumed that D is 
a convex set. 
The subgraph of f and the epigraph of f, denoted sub f and epi f, are defined 
by: 


sub f = {(x,y)e Dx R| f(x) > y) 


epi f = (x, ED xR I| f(x) < yh. 


Intuitively, the subgraph of a function is the area lying below the graph of a function, 
while the epigraph of a function is the area lying above the graph of the function. 
Figure 7.1 illustrates these concepts. 


A function f is said to be concave on D if sub f is a convex set, and to be convex 
on D if epi f is a convex set. 


Theorem 7.1 A function f:D — R is concave on D if and only ifforallx, y €e D 
and à € (0, 1), it is the case that 


SUX +AA = Af@) + -A) SO). 


Similarly, f: D — R is convex if and only if for all x, y € Dand À € (0, 1), itis the 
case that 


SRX +U =A] < ASG) + (1 —A) FO). 
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Proof First, suppose f is concave;i:e:, sub f is a convex set. Let x and y be arbitrary, 
points in D. Then, (x, f(x)) € sub f and (y, f(y)) € sub f. Since sub fis a 
convex set, it is the case that for any A € (0, 1) 


(Ax + (1 —A)y, Af@) +1 -VSE sub f. 
By definition of sub f, a point (w, z) is in sub f only if f(w) > z. Therefore, 
flAx+ (1 -A)y] = Af) + - ASO), 


as required. 


Now, suppose it is the case that for all x, y € D,and all à © (0,1), itis the case 
that 


SlAx+ (1 —A)y] = Af) + Ul — A)S(y). 


We will show that sub f is a convex set, i.c., that if (wy, z1) and (w2, z2) are arbitrary 
points in sub f, and A € (0, 1), then (Aw; + (I — A)w2, Azy + (1 — A)z2) is also in 
sub f. 


By definition of sub f, we must have f(w 1) > z; and f(w2) > z2. By hypothesis, 
if à € (0, 1), we must also have 


fwi + (1 —A)w2) > Af (wi) + (—A) fw). 
Therefore, we have f[Aw, + (1 ~A)w2] > Az) + (1 —A)z2, or 
(Aw, + (1 —A)w2,Az1 + (1 — A)z2) € sub f. 


This completes the proof for concave functions. The result for convex functions is 
proved along analogous lines. g 


The notions of concavity and convexity are neither exhaustive nor mutually exclu- 
sive; that is, there are functions that are neither concave nor convex, and functions 
that are both concave and convex. As an example of a function of the former son, 


consider f: R — R defined by f(x) = x? for x € R. Let x = —2 and y = 2. For 
dice 1/4, we have 


fax+Q-dAy) = M = 1 < 4 = Afoa). 
so f is not concave. On the other hand, for A = 3/4, we have 


Jax +0 -Ay = fO = -1 > -4 = AfG@) + - ASO), 


so f is not convex either. 
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For an example of a function that is both concave and convex, pick any a € R” 
and b € R. Consider the function f: R” —> R defined by f(x) = a-x + b. For any 
x and yin R”, and any A € (0, 1), we have 


{Ax +(l—A)y} = a-fAx+(1—-A)y] +5 
=A(a-x+b)+(1—A\a- ytd) 
=Af(x) + (1 -A SY), 


and it follows that f has the required properties.? Such functions, which are both 
concave and convex, are termed affine? 


7.1.2 Strictly Concave and Strictly Convex Functions 


A concave function f: D — R is said to be strictly concave if for all x, y e D with 
x Æ y, and alld e (0, 1), we have 


Sax +O -A)y] > ASO) + 1 - ASO). 


Similarly, a convex function f: D — R is said to be strictly convexif for all x, y € D 
with x Æ y, and for all A € (0, 1), we have 


flax + (1 —A)y] < Af(x) + (1 - ASG). 


Itis trivial to give examples of functions that are concave, but not strictly concave, 
or convex, but not strictly convex. Any affine function, for instance, is both concave 
and convex, but is neither strictly concave nor strictly convex. On the other hand, the 
function f: R44 — R defined by 


f(x) = x* 


is strictly concave if0 < œ < 1, and is strictly convex if æ > 1. At the “knife-edge” 
point a = 1, the function is both concave and convex, but is neither strictly concave 
nor strictly convex. 

Our last result of this section is an immediate consequence of the definitions of 
concavity and convexity. Its most important implication is that it enables us, in the 
sequel, to concentrate solely on concave functions, whenever this is convenient for 
expositional reasons. The analogous statements for convex functions in such cases 
are easily derived, and are left to the reader. 


Theorem 7.2 A function f:D — R is concave on D if and only if the function — f 
is convex on D. It is strictly concave on D if and only if f is strictly convex on D. 
2Indeed, it can be shown that if a function on R” is both concave and convex on R”, then it must have the 


form f(x) =a -x + b for some a and b. 
3The term “linear” is usually reserved for affine functions which also satisfy /(0) = 0. 
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7.2 implications of Convexity Saat 


This section is divided into three parts which examine, respectively, the continuity, 
differentiability, and curvature properties that obtain from an assumption of convex- 
ity. Our main results here are that: 


e Every concave or convex function must also be continuous on the interior of its 
domain (subsection 7.2.1). 

e Every concave or convex function must possess minimal differentiability proper- 
ties (subsection 7.2.2). Among other things, every directional derivative must be 
well-defined at all points in the domain of a concave or convex function. 

e The concavity or convexity of an everywhere differentiable function f can be 
completely characterized in terms of the behavior of its derivative Df, and the 
concavity or convexity of a C? function f can be completely characterized in 
terms of the behavior of its second derivative D* f (subsection 7.2.3). 


Several other useful properties of concave and convex functions are listed in the 
Exercises. 

With the exception of Theorem 7.9, which is used in the proof of Theorem 7.15, 
none of the results of this section play any role in the sequel, so readers who are in a 
hurry (and who are willing to accept on faith that convexity has strong implications) 
may wish to skip this section altogether. The statement of one result, however, may be 
worth spending some time on, on account of its practical significance: Theorem 7.10 
in subsection 7.2.3 describes a test for identifying when a C? function is concave or 
convex, and this test is almost always easier to use in applications than the original 
definition. 


7.2.1 Convexity and Continuity 


Our main result in this subsection is that a concave function must be continuous 
everywhere on its domain, except perhaps at boundary points. 


Theorem 7.3 Let f:D — R be a concave function. Then, if D is open, f is 
continuous on D. If D is not open, f is continuous on the interior of D. 


Proof We will prove that if D is open and f is concave on D, then f must be 
continuous on D. Since int D is always open for any set D, and since the concavity 
of f on D also implies its concavity on int D, this result will also prove that even if 
D is not open, f must be continuous on the interior of D. 

So suppose D is open and x € D. Let xx > x, xx € D for all k. Since D is open, 
there isr > 0, such that B(x,r) C D. Pick «æ such thatO < œ < r.Let A C B(x,r) 
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A A 


r= Àktk + (1 — Ax) WE ry = x + (1 — Ox) zk 


Fig. 7.2. Convexity and Continuity 


be defined by A = {z| ||z — x|] = a}. Pick K large enough so that for all k > K, we 
have {|x,; — x|| < œ. Since x, — x, sucha K exists. 

Then, for all k > K, there is zg € A such that xy, = gx + (1 ~6,)z, for some 
Ok € (0, 1) (see Figure 7.2). Since x, —> x, and ||z, — x|| = œ > 0 for all k, it is the 
case that 6 —> 1. Therefore, by the concavity of f, 


Sk) = fOkx + (1 — Oe) ZK) Z OLX) + A — O) f(z). 
Taking limits, we now have 
lim inf f(xx) = f(x). 


Secondly, it is also true that for all k > K, there is wą € A and A; € (0, 1) such 
that x = Agxg + (1 — Ax)we (see Figure 7.2). Once again, exploiting the concavity 
of f, we have 


FR) = FARR + CL — Ag )we) Z Ak SE) + A — àk) Swe). 
Since A, must go to | as k — oo, by taking limits we obtain 
f(x) > limsup f(x}). 
k- oœ 


We have already established that lim inf f (xx) > f(x). Therefore, we must have 
limi soo J (xk) = f(x), and the proof is complete. Oo 


The conclusions of Theorem 7.3 cannot be strengthened to obtain continuity on 
all of D; that is, the continuity of f could fail at the boundary points of D. Consider 
the following example: 
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Example 7.4 Define f: [0,1] > Rby 


vx, if0<x<1 


POS i ESO 


Then, f is concave (in fact, strictly concave) on [0, 1], but is discontinuous at the 
boundary points 0 and 1. a) 


7.2.2 Convexity and Differentiability 


As with continuity, the assumption of convexity also carries strong implications for 
the differentiability of the function involved. We examine some of these implica- 
tions in this subsection. We begin by establishing properties of concave functions 
of one variable. Then we use these properties in a bootstrapping argument to derive 
analogous results for concave functions defined on R”. 


Theorem 7.5 Let g:D — R be concave, where D C R is open and convex. Let xy, 
x2, and x3 be points in D satisfying xı < x2 < x3. Then, 


(x2) — gt) 83) -= 8) Sgt) ~ 8&2) 
X—XI = x3 — xi 7 x3 =x 


If g is strictly concave, these inequalities become strict, 


Remark Figure 7.3 describes the content of Theorem 7.5 in a graph. o 


Proof Define œ = (x2 — x1)/(x3 — x1). Then «æ € (0, 1), and some simple calcu- 
lations show that (1 — œ) = (x3 — x2)/(x3 — x1), and that æx3 + (1 — a)x; = x2. 
Since g is concave, we have 


glax3+ (1 —@)xı] > ag(x3) + (Q — æg), 


with strict inequality if g is strictly concave. Therefore, 


X x x 
g(r) > 2 g(r) + 3 
} X3 


x2 
g(x1) 
X3—7-=Xx =X} 


with strict inequality if g is strictly concave. Reananging this expression, we finally 
obtain 


g(x2) =g) 8 (%3) ~ g(x) 

x2 — X] i xy—xy 
with strict inequality if g is strictly concave. This establishes one of the required 
inequalities. The other inequalities may be obtained in a similar way. o 


180 Chapter 7 Convexity and Optimization 


g(z3) — g(z2) 
T3 ~ T2 


slope = 


glz} — g(r1) 


slope = 
T= zi 


g(x3) — g(zr1) 
T3 — Tı 


slope = 


Ty T2 T3 


Fig. 7.3. Concavity and Declining Slopes 


Theorem 7.5 implies, in particular, that if g: R —> R is a concave function, and x 
is any point in R, the difference quotient 


g(x +b) — g(x) 
b 


is nonincreasing in b for b > 0. Therefore, allowing +00 as a limiting value, the 
limit of this expression exists as b —> 0+. It is now easy to establish that a concave 
function on R must possess all its directional derivatives: 


Theorem 7.6 Jf g:R —> R is concave then all (one-sided) directional derivatives 
of g exist at all points x in R, although some of these may have infinite values. 


Proof Fix any x and y in R. First, suppose y > 0. Then, 


(= Phy 0) A (= +ty)- so) 
t i ty í 
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Since y > 0, it is the case that 1) >“O Whenever ¢ > 0; therefore, ty — 0+ if and 
only if £ > 0+. Letting b = ty, we have 
r(x + ty) — g(: . x+b)— g(x 
lim (eee y) a) = ye lim Aas ”) gí ) 7 
10+ t b—>0+ b 
We have already established as a consequence of Theorem 7.5 that the limit of the 
second expression on the right-hand side exists as b —> 0+. Therefore, the directional 
derivative of x exists in any direction y > 0. 
Now suppose y < 0. Then, we have 


(= tty) — 2) (= ty) = so) (ay 
t 7 —ty Fa 

Since y < 0, we have —ty > 0 whenever?! > 0; therefore, —t y —> 0+ if and only 

ift > 0+. Letting b = ~t y, we have 


ty) — g(x) b)—- g 
lim (= +ty) sc) siey ia (= +b) £0?) l 
t—=0+ t b b 

Since the limit on the right-hand side is known to be well-defined, we have shown 
that the directional derivative of g at x exists in any direction y < 0 also. 


Finally, if y = 0, the directional derivative at x in the direction y trivially exists 
since 


0+ 


(= tty) - 2) 7” (= = so) 2 
t ~ t =~ 
and this completes the proof. g 


An n-dimensional analogue of this result is now easy to establish: 


Theorem 7.7 Let D C R" be open and convex. Then, if f:D — R is concave, f 
possesses all directional derivatives at all points in D. (These derivatives could be 
infinite.) 


Proof Let x e Dand h e R” be arbitrary. Define g(t) = f(x + th), t > 0. Since 
D is open, g is well-defined in a neighborhood of zero. Note that 
L(x + th) ~ fx) _ g(t) — (0) 
t t f 
Moreover, for any œ € (0, 1), 


glat + (1 —a)t’} 


flax +th)+ (lax + hy] 
>Paf(xtth+ Ul—a)fixt+ th) 
= ag(t)+ (1 — a)g(t’), 


V 
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so g is concave on R4}. By Theorem 7.6, therefore, the difference quotient 


(2 = a) 

t 
is nondecreasing as ¢ —> 0+, so the limit as £ — 0+ of (g(t) — g(0))/t exists, 
although it could be infinite. From the definition of g, this limit is Df(x; h). Since 
x € Dand y € R” were arbitrary, we are done. o 


A natural question that arises from Theorem 7.7 is whether, or to what extent, 
convexity actually implies full differentiability of a function. The answer to this 
question, unfortunately, requires a knowledge of the concept of Lebesgue measure 
on R”. For readers unfamiliar with measure theory, an interpretation of the result is 
given following the statement of the theorem. 


Theorem 7.8 Let D be an open and convex set in R", and let f:D —> R be 
concave. Then f is differentiable everywhere on D, except possibly at a set of points 
of Lebesgue measure zero. Moreover, the derivative Df of f is continuous at all 
points where it exists. 


Proof See Fenchel (1953, p.87). o 


Remark A property which holds everywhere on a subset of R”, except possibly 
on a set of Lebesgue measure zero, is also said to hold “almost everywhere” on 
that subset (or, more accurately, “almost everywhere with respect to the Lebesgue 
measure”). In this terminology, Theorem 7.8 would be stated as: a concave or convex 
function defined on an open subset of R” must be differentiable almost everywhere 
on its domain. m) 


The following discussion is aimed at readers who are unfamiliar with measure 
theory. It attempts to provide an intuitive idea of what is meant by a set of Lebesgue 
measure zero. A more formal description may be found in Royden (1963) or Billings- 
ley (1978). Define a cylinder in R" to bea set E of the form 


E = (Xi... Xn) lai < xi Sbi P= 1,..., nf, 


where a = (a1, ..., an) and b = (b, ..., bn) are given vectors satisfying a & b. 
The Lebesgue measure of this cylinder, denoted u (E) say, is defined as 


u(E) = (by — a1)(b2 — a2) +++ (bn ~ an). 


Observe that the Lebesgue measure is a natural generalization of the familiar “mea- 
sures” of length in R, area in R?, and volume in R?. A cylinder in R is simply an 
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eae 


interval (a, b], and its Lebesgue méasire is (b — a), which is the same as the in- 
terval’s length. Similarly, a cylinder in R? is a rectangle, whose Lebesgue measure 
corresponds to our usual notion of the rectangle’s area, while a cylinder in R? is a 
cube, whose Lebesgue measure corresponds to our usual notion of volume. 

In an intuitive sense, a set of Lebesgue measure zero may be thought of as a set 
that can be covered using cylinders whose combined Lebesgue measure can be made 
arbitrarily small. That is, a set X C R” has Lebesgue measure zero if it is true 
that given any € > 0, there is a collection of cylinders £}, £2, ... in R” such that 
XC ap E; and 


So Ei) < €. 
i=l] 


For instance, the rationals Q constitute a set of measure zero in R. To see this, take 
any enumeration {r), 72, .. .} of the rationals (recall that the rationals are a countable 
set), and any € > 0. Pick 7 such that O < n < €. For each k, define rp and re by 


1 k+1 l k+1 
- + 
re = rk- (3) n Th = kt (3) n 


and let Ex be the interval (r; , ry]. Note that 


1 k+1 l k 
u(Ek) = 2(;) n = (5) n. 


We now have rg € Ex for each k, so Q C UZ} Ex. Moreover, 
CO 
=! 


DuEn s (5) Spee, 


so we have covered the rationals by cylinders of total measure less than e. Since € 
was arbitrary, we are done. 


7.2.3 Convexity and the Properties of the Derivative 


In this section, we establish the implications of convexity for the behavior of the 
derivative. Our first result provides a complete characterization of the concavity or 
convexity of an everywhere differentiable function f using its first derivative Df: 


Theorem 7.9 Let D be an open and convex set in R", and let f:D —> R be 
differentiable on D. Then, f is concave on D if and only if 


Df(x)\(y-x) > fy) — fœ) forallx,y €D, 
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while f is convex on D if and only if 


Dfx)\(y—x) < fly) — f(x) forallx,y€D. 
Proof See Section 7.5. 0 


The concavity or convexity of a C? function can also be characterized using its 
second derivative, as our next result shows. The result also provides a sufficient 
condition for identifying strictly concave and strictly convex functions, and is of 
especial interest from a computational standpoint as we explain shortly. 


Theorem 7.10 Let f: D — R be a C? function, where D C R” is open and convex. 
Then, 


1. fisconcave on D if and only if D? f (x) is a negative semidefinite matrix for all 
xeD. 


2. f is convex on D if and only if D? f(x) is a positive semidefinite matrix for all 
xeD. 

3. If D? f(x) is negative definite for all x € D, then f is strictly concave on D. 

4. If D? f(x) is positive definite for all x € D, then f is strictly convex on D. 


Proof See Section 7.6. o 


It is very important to note that parts 3 and 4 of Theorem 7.10 are only one- 
way implications. The theorem does not assert the necessity of these conditions, 
and, indeed, it is easy to see that the conditions cannot be necessary. Consider the 
following example: 


Example 7.11 Let /:R — Rand g:R — R be defined by f(x) = —x4 and 
g(x) = x4, respectively. Then, f is strictly concave on R, while g is strictly convex 
on R. However, f’(0) = g’(0) = 0, so, viewed as 1 x 1 matrices, f’(0) is not 
negative definite, while g’(O) is not positive definite. (m) 


Our next example demonstrates the significance of Theorem 7.10 from a practical 
standpoint. 


Example 7.12 Let f: R}, — R be defined by 


f(x,y) = x%y’, a,b>0. 
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For given a and b, this function is-concave if, for any (x, y) and (x, Y) in R? » and 
any A € (0, 1), we have 


[Ax + (1 —Ayg}*PAy +AA P < Axy + (1 = ayers? 
while it is convex on Ri, if for all x and yin Ri, and all à € (0, 1), we have 
[Ax + —AZP Py + AS > arty? + — apes y?. 


Compare checking for the convexity properties of f by using these inequalities to 
checking for the convexity properties of f using the second derivative test provided by 
Theorem 7.10. The latter only requires us to identify the definiteness of the following 
matrix: 


: a(a = 1)x@72 y? abt! yb—! 
D? f(x, y) = 


abx°?™! yb-1 b(b — 1)x yP? 


The determinant of this matrix is ab(1 — a ~ b)x24~* y26-? | which is positive if 
a+b <1,zeroifa +b = 1, and negative ifa + b > 1. Since the diagonal terms 
are negative whenever a,b < 1, it follows that f is a strictly concave function if 
a+b < 1, that it is a concave function if a +b = 1, and that it is neither concave 
nor convex whena +b > I. a 


7.3 Convexity and Optimization 


This section is divided into three parts. In subsection 7.3.1, we point out some simple, 
but strong, implications of assuming a convex structure in abstract optimization 
problems. Subsection 7.3.2 then deals with sufficiency of first-order conditions for 
unconstrained optima. Finally, in subsection 7.3.3, we present one of the main results 
of this section, Theorem 7.16, which shows that under a mild regularity condition, 
the Kuhn—Tucker first-order conditions are both necessary and sufficient to identify 
optima of convex inequality-constrained optimization problems. 

All results in this section are stated in the context of maximization problems. Each 
has an exact analogue in the context of minimization problems; it is left to the reader 
to fill in the details. 


7.3.1 Some General Observations 


This section presents two results which indicate the importance of convexity for 
optimization theory. The first result (Theorem 7.13) establishes that in convex opti- 
mization problems, all local optima must also be global optima; and, therefore, that 
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to find a global optimum in such problems, it always suffices to locate a local opti- 
mum. The second result (Theorem 7.14) shows that if a strictly convex optimization 
problem admits a solution, the solution must be unique. 


Theorem 7.13 Suppose D C R” is convex, and f:D — R is concave. Then, 


1. Any local maximum of f is a global maximum of f. 
2. The set arg max{ f(x) | x & D} of maximizers of f on D is either empty or 
convex. 


Proof Suppose f admits alocal maximum x that is notalso a global maximum. Since 
x is a local maximum, there isr > O such that f(x) > f(y) forall y € B(x, r)ND. 
Since x is not a global maximum, there is z € D such that f(z) > f(x). Since D is 
convex, (Ax + (1 — A)z) € D forall à € (0, 1). Pick à sufficiently close to unity so 
that (Ax + (1 — 4)z) € B(x,r). By the concavity of f, 


Slax + (1 -A)z] = ASOU) > SO), 


since f(z) > f(x). But Ax + (1 —A)z is in B(x, r) by construction, so f(x) > 
J (Ax + (1 — A)z], a contradiction. This establishes Part 1. 

To see Part 2, suppose xy and x2 are both maximizers of f on D., Then, we have 
f(x) = f(Q2). Further, for A € (0, 1), we have 


Saxi + Oax 2 ASG + ~ASO2) = fx), 


and this must hold with equality or x; and x2 would not be maximizers. Thus, the 
set of maximizers must be convex, completing the proof. (m) 


Theorem 7.14 Suppose D C R" is convex, and f: D — R is strictly concave. Then 
arg max{ f(x) | x € D} either is empty or contains a single point. 


Proof Suppose arg max{/ (x) | x € D} is nonempty. We will show it must contain 
a single point. 

We have already shown in Theorem 7.13 that arg max{ f(x) | x € D} must be 
convex. Suppose this set contains two distinct points x and y. Pick any À € (0, 1) 
and let z = Ax + (1 — A) y. Then, z must also be a maximizer of f, so we must have 
f(z) = f(x) = f(y). However, by the strict concavity of f, 


f@ = faxt A-A > ASDA- ADO) == SO), 


a contradiction. o 
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7.3.2 Convexity and Unconstrained Optimization sn les 


The following result shows that the first-order condition for unconstrained optima 
(i.e., the condition that Df(x) = 0) is both necessary and sufficient to identify global 
unconstrained maxima, when such maxima exist. 


Theorem 7.15 Let D C R” be convex, and f:D — R be a concave and differen- 
tiable function on D. Then, x is an unconstrained maximum of f on D if and only if 


Df(x) =0. 


Proof We have shown in Chapter 4 that the condition Df(x) = 0 must hold when- 
ever x is any unconstrained local maximum. It must evidently also hold, therefore, 
if x is an unconstrained global maximum. 

The reverse implication (which requires the concavity of f) is actually an imme- 
diate consequence of Theorem 7.9. For, suppose x and y are any two points in D. 
Theorem 7.9 states that, by the concavity of f, we must have 


S- f(x) < Df(x)(y— x). 


If Df(x) = 0, the right-hand side of this equation is also zero, so the equation states 


precisely that f(x) > f(y). Since y € D was arbitrary, x is a global maximum of f 
on D. oO 


7.3.3 Convexity and the Theorem of Kuhn and Tucker 
The following result, perhaps the most important of this entire section, states that the 
first-order conditions of the Theorem of Kuhn and Tucker are both necessary and 


sufficient to identify optima of convex inequality-constrained optimization problems, 
provided a mild regularity condition is met. 


Theorem 7.16 (The Theorem of Kuhn and Tucker under Convexity) Ler f be 
a concave C! function mapping U into R, where U c R” is open and convex. For 
i= 1,...,l, let hi: U — R also be concave C! functions. Suppose there is some 
X € U such that 


Cs > 0 ishon h 
Then x* maximizes f over 


D = [{xEU lAi(x)>0,f=1,...,9 
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if and only if there is X* € RÝ such that the Kuhn—Tucker first-order conditions hold: 


i 
[KT-1] Df (x*) + YCA? Dhi(x*) = 0. 


i=] 


1 
[KT-2] 2* > 0, $ Afh") = 0. 


i=] 
Proof Sce Section 7.7. o 


The condition that there exist a point x at which h;(¥) > 0 for all é is called 
Slater’s condition. There are two points about this condition that bear stressing. 

First, Slater’s condition is used only in the proof that [KT-1] and [KT-2] are nec- 
essary at an optimum. It plays no role in proving sufficiency. That is, the conditions 
(KT-1] and [KT-2] are sufficient to identify an optimum when f and the functions 
h; are all concave, regardless of whether Slater’s condition is satisfied or not. 

Second, the necessity of conditions [KT-1] and [KT-2] at any lpcal maximum 
was established in Theorem 6.1, but under a different hypothesis, namely, that the 
rank condition described in Theorem 6.1 held. Effectively, the necessity part of 
Theorem 7.16 states that this rank condition can be replaced with the combination 
of Slater’s condition and concave constraint functions. However, both parts of this 
combination are important: just as the necessity of [KT-1] and [KT-2] could fail if the 
rank condition is not met, the necessity of [KT-1] and [KT-2] could also fail if either 
Slater’s condition or the concavity of the functions A; fails. Consider the following 
examples: 


Example 7.17 Let f and be functions on R defined by f(x) = x and h(x) = ~x? 
for all x € R. Then f and A are concave functions. However, the constraint set 


D = {x €R] A(x) > 0} 


consists of exactly the one point 0; thus, there is no point x € D such that h(x) > 0, 
and Slater’s condition is violated. Evidently, the maximum of f on D must occur at 
0. Since £’(0) = 1 and A’(0) = 0, there is no à” such that (0) + A*g’(O) = 0, and 
[KT-1] fails at this optimum. m 


Example 7.18 As in Example 6.4, let f: R? —> R and h: R? > R be defined by 
f(x, y= x? — y? and h(x, y) = (x— 13- y? respectively. Consider the problem 
of maximizing f on the set 


D = ((, y) € R? | h(x, y) > 0). 
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Note that while Slater’s condition-is met in this problem (for instance, A(x, w) =- 
1 > Oat (x, y) = (2, 0)), h is not concave on D. As a consequence, the conditions 
{KT-1] and [KT-2] fail to be necessary; indeed, we have shown in Example 6.4 that 
the unique global maximum of f on D occurs at (x*, y*) = (1,0), but that there 
is no A* e R such that D/(x*, y*) + A* Dg(x*, y*) = 0. Thus, [KT-1!] fails at this 
optimal point. 0 


The point of Example 7.18 is to stress that Slater’s condition cannot, by itself, 
replace the rank condition of Theorem 6.1. Rather, this is possible only if the functions 
hi are all also concave. 


7.4 Using Convexity in Optimization 
The results of the previous section are of obvious importance in solving optimization 
problems. The purpose of this section is to highlight by repetition their value in this 
direction. We focus on inequality-constrained problems of the sort 


Maximize f(x) subjecttox e D = {z EU |Aj(z)>O0, i=1,..., {}, 


where the functions f and h; are all concave C! functions on the open and convex set 
U c R”. Recall that the Lagrangean for this problem is the function L: U x R! > R 
defined by 


l 
LA) = f(x) + Yo ihia), 
i=l 


and the critical points of L are the points (x, à) € U x R’ that satisfy the following 
conditions: 


Df(x) + A;Dhj(x) = 0, 


ài 29, hix) =O, A(X) = 0, F=1,...,1. 


Of course, as we have seen in Section 6.2, a point (x, A) is a critical point of L if and 
only if x € D and (x, A) meets [KT-1] and [KT-2}. 

First, we consider the case where Slater’s condition is met. In this case, Theo- 
rem 7.16 has the powerful implication that the cookbook procedure for using the 
Theorem of Kuhn and Tucker (outlined in Section 6.2) can be employed “blindly,” 
that is, without regard to whether the conditions of Proposition 6.5 are met or not. 
Two factors give rise to, and strengthen, this implication. First, the concavity of the 
functions f and h; together with Slater’s condition imply that the first-order condi- 
tions (KT-1] and (KT-2] are necessary at any optimum. So, if an optimum exists at 
all, it must satisfy these conditions, and must, therefore, turn up as part of a critical 
point of the Lagrangean L. Second, since [KT-1} and [KT-2] are also sufficient for a 
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maximum, it is the case that every critical point identifies a solution to the problem. 
Summing up: 


e If L has no critical points, then no solution exists to the given problem. 
e If (x*, A*) is a critical point of L, then x* is a solution of the problem. 


In particular, the last step outlined in the cookbook procedure of Section 6.2 (viz., 
comparing the values of f at the different critical points of L), can be ignored: since 
each of these points identifies a global maximum, the value of f at all these points 
must, perforce, be the same. 

The situation is less rosy if Slater’s condition is not met. In this case, it remains true 
that every critical point of L identifies a solution to the problem, since the conditions 
{KT-1] and [KT-2] are still sufficient for a solution. On the other hand, [KT-1] and 
[KT-2] are no longer necessary, so the absence of a critical point of L does not enable 
us to conclude that no solution exists to the problem. In Example 7.17, for instance, 
we have seen that Slater’s condition fails; and although the Lagrangean L admits 
no Critical points, a solution to the problem does exist. This points to a potentially 
serious problem in cases where the existence of solutions cannot be verified a priori; 
however, its practical significance is much diminished by the fact that the failure of 
Slater’s condition happens only in rare cases. 

Finally, a small point. When the constraint functions }; are all concave, the con- 
straint set D is a convex set. Thus, if the objective function f happens to be strictly 
concave, there can exist at most one solution to the problem by Theorem 7.14. From 
an applications standpoint, this observation implies that if we succeed in unearthing 
a single critical point of the Lagrangean, we need not search for any others, since the 
problem’s unique solution is already identified. 


7.5 A Proof of the First-Derivative Characterization of Convexity 


We prove Theorem 7.9 for the case where f is concave. The result for convex f then 
follows simply by noting that D(— f) = —Df, and appealing to Theorem 7.2. 
So suppose first that f is concave. We have 


DGG es lim SQ tt(y—x))-— fx) = lim Sity+U—t)x)- fx) 
10+ t t-0+ t 

When t € (0, 1), the concavity of f implies f(ty+(1—1)x) > tf(y)+(1—-1) f(x), 

so by choosing t > 0, t — 0, we have 


f(y) +b - OF @) = fa) 


Df(xy(y— x) 2 lim. ; 


= f(y) — fo), 


and the result is proved. 
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Now, suppose that for all xı and x2 in D we have # 
Df(xi)Q2-— 41) 2 f0r2) fœ) 
Pick any x and y in D, and any A € (0, 1). We will show that we must have 
SUx +O -=I Ss Af) +0 - ASO, 
which will establish that f is concave on D. For expositional convenience, define 


zZ=Ax+(1—-A)y, 
w=x-z = (1-A\(x— y). 


Note that we have 


< 

I 

N 

| 
AT 
|> 

= 

NY 

g 


By hypothesis, we also have 


Sœ- fly < Dfe- z) = Of zw, 
and 


À 
fO- SE < DADY- z) = (=) anaw. 


Multiplying the first equation by [A/(1 —åà)), and adding the two equations, we obtain 


À 1 
(5) f(x) + fO) - (=) fœ) <0. 
Rearranging terms after multiplying through by (1 — A), we have 


Af(x) +l- DSO < f(z) = ffAx+(1-A)yl, 


which completes the proof. a) 


7.6 A Proof of the Second-Derivative Characterization of Convexity 


For expositional convenience, we prove Theorem 7.10 for the case where the domain 
D of f is all of R”. With minor changes, the proof is easily adapted to include the 
case where D is an arbitrary open and convex subset of R”. The proof requires the 
following preliminary result: 


Lemma 7.19 Let f:R” — R. Given any x and h in R”, define the function @, 4(-) 
by px,h (t) = f(x + th), t € R. Then, 


1. f is concave on R" if and only if the function gx p(-) is concave in t for each 
fixedx,h e R”. 
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2. If oxhC) is strictly concave in t for each fixed x, h € R” with h Æ 0, then f is 
strictly concave on R”. 


Proof of Lemma 7.19 First suppose that f is concave on R”. Fix any x and A in 
R”. For any pair of real numbers ¢ and ¢’, and any «æ € (0, 1), we have 


Pr alat + (1—a)t') = f(x +ath + (1 — ath) 
f(@(xt+th) + (1 —a)(« 4 h)) 
> af(x +th)+(l—a) f(x +0/h) 
= ax a(t) + (1 — wy, alt’), 


l 


SO Px.4(-) is indeed concave in ¢. 

Next, suppose for any x and h in R", gy 4(-) is concave in ¢. Pick any z}, z2 € R” 
and any a € (0, 1). Let z(œ) = az; + (1 ~— @)z2. In order to prove that f is concave, 
we are required to show that f(z(a@)) > f(z) + (1 — œ) f(z2) for any œ e (0, 1). 

Consider the function x,a (-) where x = z}, and h = z2—2z). Note that g, ,(0) = 
J (21) and gra(1) = f(z2). Moreover, px la) = f(z) + a(z2—- 21) = AA - 
œ)zı + a2). Since ,4(-) is concave by hypothesis, we have for any a € (0, !) 


S(O —a@)zy +022) = Px, nla) 

Px hl — @)0 + æl) 

> (1 — @) gx, n(0) + agxn(l) 

= (l-a@) f(z) + (1 —@) f(z2), 


which completes the proof that f is concave. 
The proof of Part 2 of the Iemma is left as an exercise to the reader. (E 


Proof of Theorem 7.10 It is easy to see that for any C? function g: R” — R, the 
matrix of cross-partials D?g(x) at a point x, is negative semidefinite (resp. nega- 
tive definite) if and only if D*A(x) is positive semidefinite (resp. positive definite). 
Therefore, Parts 2 and 4 of the theorem will be proved if we prove Parts 1 and 3. We 
concentrate on establishing Parts 1 and 3 here. Once again, we resort to a bootstrap- 
ping argument; we establish the theorem first for the case n = 1 (i.e, fR — R), 
and then use this to prove the general case. 


Case l:n =} 


Suppose first that f: R —> R is a C? concave function. We will show that D? f is 
negative semidefinite at all x € D, i.e., that f”(x) < O at all x € D. 

Let x, y € R and suppose x < y. Pick sequences {xz} and {yx} in R so that for 
each Å we have x < xk < yk < y, and xk > x, yk — y. By repeating the arguments 
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of Theorem 7.5, it is seen that the following inequalities hold at each k: 


LO) — £08) Se- FO” S-S 


X — Xk ~ Xk — Vk = YyYk— y 


When k — oo the left-most term in this expression converges to f'(x), while the 
right-most term converges to f’ (y), so we must have f'(x) > f'(y). Since x and y 
were arbitrary points that satisfied x < y, this says f’ is a nonincreasing function 
on R. If f’ is nonincreasing, its derivative f” must satisfy f’(x) < Oatallx € R, 
which is precisely the statement that f” is negative semidefinite at all x. 

Now suppose f:R + R satisfies f”(x) < 0 at all x € R. We will show that f 
is concave on R, completing the proof of Part 1 of the theorem for the case n = }. 
Pick any x and y in R and assume, without loss of generality, that x < y. Pick any 
à € (0, 1), and letz = Ax+(1—A)y. We are to show that f(z) > ASO)+0-A SQ). 

By the Mean Value Theorem (see Theorem 1.71 in Chapter 1), there exist points 
w, and w2 such that w; € (x, z), w2 € (z, y), and 

Iw- i mi fO- SY) _ fis 
x-Zz z=- y 
Since w; < wz and f” < 0, we must have f’(w;) > f'(w2). Using this in the 
expression above, we obtain 


Tarada 5 SENS TO). 
x-z z= y 


Cross-multiplying and rearranging terms, we finally obtain, 


yrz z2-xX 

f(z) 2 pag gd OF 

Since z = Ax +(1—A)y, we have A = (y—z)/(y—x) and)—A = (z~—x)/(y—x). 
Substituting in the inequality above, the proof of the concavity of f is complete. This 
establishes Part 1 of the theorem for the case n = 1. 

Note that if we had f”(x) < 0, then we would have also had f’(w ,) > f’(w2), 
so all the inequalities in the above argument become strict. In particular, retracing 
the steps (but with the strict inequality) establishes that f is then strictly concave, 
completing the proof of Part 3 of the theorem for the case n = 1. 


Case 2:n > 1 
We now turn to the general case where f: R” — R for somen > 1. Lemma 7.19 
will be repeatedly invoked in this step. 
We begin with Part 1. Suppose first that f is concave. Pick any x and A in R”. We 
are required to show that h’ D? f(x)h < 0. Define x,a (t) = f(x + th). Since f is 
C? by hypothesis, so is gy,4(-), and in fact, we have Pr A(t) = Df (x + th) -h, and 
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Pih (i) = h' D? f(x + th)h. Moreover, by Lemma 7.19, xal) is concave in f, SO 
we have gy ,(¢) < 0 for all z, by the result established for concave functions of one 
variable. Therefore, h’ D? f (x+th)h < 0 forall t,andin particular, h' D? f(x)h < 0, 
as required, 

Now suppose that D? f(z) is negative semidefinite at all z. We will show that f 
is concave by showing that for any x and A in R”, the function gx ,4(¢) = L(x + th) 
is concave in f. Indeed, since f is C2, we again have Prr) = Df(x+th)-h, and 
AO) = h' D? f(x + th)h. Since D? f is negative semidefinite at all points, it is 
the case that h’ D? f(x + th)h < 0. Therefore, Py. ,(t) < 0 everywhere, so by the 
result established for concave functions of one variable, ,,4(-) is concave. Since x 
and h were arbitrary, an appeal to Lemma 7.19 establishes that f is also concave, 
completing the proof of Part | of the theorem. 

To see Part 3, suppose that D? f(z) were negative definite at all z. Pick any x 
and A in R” with A # 0. Let gx a(t) = f(x + th). As above, the twice-continuous 
differentiability of f implies the same property for g,,4(-), and we have Ph (j= 
h D? f(x + th)h for any t. Since h # 0 and D? f is negative definite everywhere, 
it follows that Pent) < 0 at all ¢. From the result established in the case n = 1, 
it follows that g,4(-) is strictly concave in £. Since x and h # 0 were arbitrary, 
Lemma 7.19 implies that f is also strictly concave. o 


7.7 A Proof of the Theorem of Kuhn and Tucker under Convexity 


We first present a result that can be viewed as an abstract version of Theorem 7.16. 
Then, we will use this result to prove Theorem 7.16. 

A definition first. Let x € D C R”, and y € R”. We will say that y points into D 
at x if there is w > O such that for all n € (0, w), we have (x + ny) e D. 


Theorem 7.20 Suppose f:D —> R is concave, where D C R” is convex. Then x* 
maximizes f over D if, and only if, Df(x*; y) < 0 for all y pointing into D at x*. 


Proof Suppose x* maximizes f over D. Let y point into D at x*. Then, forall > 0 
such that (x* + ny) € D, we have f(x") > f(x* + ny) since x* is a maximizer, 
Subtracting f(x*) from both sides, dividing by n and taking limits as n —> 0+ 
establishes necessity. 

Conversely, suppose Df(x*; y) < 0 for all y pointing into D at x*. If x* does not 
maximize f over D, there exists z € D with f(z) > f(x"). Let y = z ~ x*. Then, 
x" + y = z, so by taking w = 1, it follows from the convexity of D that y points 


4Note that, as with all necessary conditions, the concavity of f and the convexity of D played no role here. 
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soe 


into D at x*. But for n € (0, 1), 


fa +n(z-x*)) > =n) f(x") + nf) 
= f(x") +n(f(z) — f(")) 
by concavity of f. Thus, 


SO Fy = z= FO") 
n 


> f(z)— f(x") > 0. 


But the left-hand side tends to Df(x*; y)asn > 0+ and thisimplies Df(x"; y) > 0. 
By hypothesis, on the other hand, D/(x*; y) < 0 since y points into D at x", a 
contradiction establishing the theorem. o 


We begin our proof of Theorem 7.16 by demonstrating the sufficiency of the 
conditions [KT-1] and (KT-2] when f and the functions h; are all concave. (As we 
have already mentioned, Slater’s condition plays no part in proving sufficiency.) Our 
proof will use the fact that if a function ¢: U > R is differentiable at a point x, then 
the directional derivative Dẹ (x; h) exists at all A € R”, and, in fact, Dẹ (x; h) = 
Do (xyh. 

So suppose that there exists A* € Rt such that [KT-1] and [KT-2] hold. Let 


Di = (x €U |hi(x) = 0}. 


Suppose x1, x2 € Dj. Pick any à € (0, 1), and letz = Ax; +(1 —A)x2. Then, z € U 
since U is convex. Moreover, hj(z) > AAj (x1) + (1 — Ahi (x2) > 0, so z € D,. 
Thus, D; is convex for each i. This implies D = nl Di is also convex. Since f is 
concave, all that remains to be shown now is that Df(x*)y < O for all y pointing 
into D at x*, and we can then appeal to Theorem 7.20. 

So suppose y points into D at x*. Fix y. We will show that for each i = 1,..., l, 
we have 4? Dh; (x*) y > 0. First, note that by definition of y, there is € > 0 such that 
for all ¢ € (0, €), we have (x* +7 y) € D. This implies A; (x* +r y) > 0 for all í. for 
alls € (0, €). 

Pick any i. There are two possibilities: A; (x*) > 0, and A; (x*) = 0. In the first 
case, A? = 0 by condition [KT-2], so certainly, A* Dhi (x*)y > 0. Now, consider the 
case h;(x*) = 0. By hypothesis, we have h;(x* + ry) > Oforallr e (0, €), so we 
also have 

hi(x* + ty) — hy(x") 
t 


> 0 


for all ¢ € (0, €). Taking limits as ¢ | 0, we obtain DA; (x*)y > O. Since A¥ > 0, we 
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finally obtain A¥ Dh; (x*)y > 0 in this case also. It now follows that 


i 
Pf(x")y = -YN Dhi" )y < 0. 


i=l 


Since y was an arbitrary vector pointing into D at x*, this inequality holds for all 
such y. By Theorem 7.20, x* is then a maximum of f on D, completing the proof 
of the sufficiency of {[KT-1] and [KT-2}. 

We now turn to necessity. Unlike the sufficiency part, Slater’s condition will play 
an important role here. So suppose that x* isa maximum of f on the given constraint 
set D. We are to show the existence of A* such that [KT-1] and [KT-2] hold. Define 
the function L: U x R! > Rby 


l 
Lœ, = f(x) + Y dA). 


i=l 
To prove the result, we will show that there is A* € Ri, which satisfies 
AFhi(x*) = 0, i anne fF 
as well as? 
L(x, A*) < L(x*,d*), x eu. 
Since A* > O, the first of these conditions establishes [KT-2]. The second condition 
States that x* isa maximum of L(-, A*) on U. Since U is open and convex, and L(+, A*) 


is concave in x, x* can be a maximum of L on U if and only if DL,(x*, A*) = 0, 
i.e., if and only if 


l 
Df(x*) + J AP Dhi(x*) = 0. 
i=l 


Thus, [KT-1] must also hold, and the proof will be complete. 
We will utilize a separation theorem to derive the required point A*. To this end, 
define the sets X and Y by 


X = {(w,z)ée Rx R” | w < f(x), z < A(x) for some x € U}, 
and 
Y = {((w,2) eR x R” |w > f(*),z > 0}. 
We claim that Æ N Y is empty. For if there were a point (w, z) in this intersection, 


5Qne can actually show that the following stronger “saddle-point condition” is met at (x*, 4*): 


L(A") <= Lt") < LOtA), «= xe UL AER. 
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we would have the existence òf ‘an x in U such that f(x) > w > f(x*)-and 
0 < z < h(x), so the point x is feasible and dominates x*. This contradicts the 
presumed optimality of x*. Thus, X and Y have no points in common, as claimed, 
It is also true that X and Y are convex sets. The convexity of is obvious. The 
convexity of X follows from the concavity of f and the constraint functions 4;. 
By Theorem 1.68, there is a vector (p,q) € R x R”, (p,q) # 0, such that 


pw+q:z < putq-v, (w,z)€ X, (u,v) ey. 


From the definition of VY, we must have (p,q) > 0. For, if some coordinate was 
negative, by taking the corresponding coordinate of (u, v) positive and large, the 
sum pu + q - v could be made arbitrarily negative. It would not, then, be possible to 
satisfy the inequality required by the separation. 

Now, by taking a sequence (Uum, Vm) € Y converging to the boundary point 
(f(x*), 0), it also follows from the separation inequality and the definition of A' 
that 


pfx) +q-h) < pf(x*), xeU. 


We have already established that (p, q) > 0. We now claim that p = 0 leads to a 
contradiction. If p = 0, then from the last inequality, we must have 


q-h(x) < 0, xEU. 


But p = 0 combined with (p,q) > 0 and (p,q) Æ 0 implies g > 0 (Le., that 
q = (q1,.--,q1) has at least one positive coordinate). By Slater’s condition, there 
is x € U such that h(x) > 0. Together with q > 0, this means q - A(X) > O.a 
contradiction. Therefore, we cannot have p = 0. 


Now define 
l 
aM = EE (4.....4) > 0. 
Pp pP p 


We then have 


i 
Sx) + Doth) < fr"), xeU. 


If we take x = x* in this expression, we obtain y APA (x*) < 0. On the other 
hand, we also have A(x*) > 0, and A* > 0, so $i ATAi(x") > 0. Together, these 
inequalities imply 


l 
Do athi(x*) =0, 
i=l 
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and, therefore, that for all x € U, 


l 
LA) = f(x) + $ Ahil) 


i=] 


< f@") 
l 
= f(x") + Nh”) 


i=l 


= L(x*,A*). 


The proof is complete.® (R) 


7.8 Exercises 


1. Define f: R? + R by f(x, y) = ax? + by? +2cxy + d. For what values of a, 
b,c, and d is f concave? 


2. Let f: RU, —> R be defined by 
SÆ es Xn) = log (xf xn) 
where a > 0. Is f concave? 


3. Let f: R}. — R be a concave function satisfying /(0) = 0. Show that for all 
k > 1 wehavekf(x) > f(kx). What happens if k € [0, 1)? 


4. Let D = {(x, y) € R2|x? + y? < 1) be the unit disk in R*. Give an example of 
a concave function f: D — R such that f is discountinuous at every boundary 
point of D. 


5. Show that the linear function f: R” —> R defined by f(x) = a-x—b,a eR”, 
b € R is both convex and concave on R”. Conversely, show that if f: R" > R 
is both convex and concave, then it is a linear function. 


6. Let { fi, f2» -.-, fn} be a set of convex functions from R” to R. Show that the 
nonnegative linear combination 


Sx) = 0 fix) +--+ + On fax) Als. An >0 
is convex. Is this still true for any linear combination of convex functions? 


6To see that (x*,A*) is actually a saddle point of L as claimed in the earlier footnote, note that since 
he Ajhj(x*) = 0, we also have an Ajhy(x") > pa Ah; (x*). Therefore, 


i I 
LOx*, A*) = fx) + So apni’) < Se) + J uhi) = L(x*,d). 
i=l 


i=l 


7. 


11. 


13. 


a, 
i 
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Show that a function f: R” + R is‘convex if and only if for each x1, x2 € R", ~ 
the function g:[0, 1] > R defined by 


g(a) = flx + (lL — A)x2] 


is convex on (0, I]. Is the above statement still true if we replace convex with 
concave? 


. Let {fj : i € I} be a set (finite or infinite) of functions from a convex set D c R” 


to R which are convex and bounded on P. Show that the function, f, defined as 
J (x) = sup fi(x) 
iel 
is a convex function on D. What about the function g, defined as 
g(x) = inf fi(x)? 
ie] 


Is g convex? Why or why not? 


. Let D c R" be a convex set. Let f:D — R be a differentiable function. Show 


that the following are equivalent: 

(a) f is concave on D. 

(b) f(y) — f(x) < Df(x)(y — x) for all x, ye D. 
(c) [Df(y) — DAW — x) < 0 forall x, ye D. 


. Let f: R” — R be concave. Let A be ann x m matrix, and let b € R”. Consider 


the function h: R” — R defined by 
h(x) = f[Ax+b], x eR”. 
Is h concave? Why or why not? 


Let f and g be concave functions on R. Give an example to show that their 
composition fog is not necessarily concave. Show also that if f is an increasing 
concave function and g is any concave function, then f o g will also be concave. 
What if instead, g were increasing and concave, and f were simply concave? 


. Let f and g be concave functions on R. Is the product function f - g concave on 


IR? Prove your answer, or provide a counterexample. 


Let f:[0, 2] > R be defined by 


x, x € (0, 1] 


TONS 2—x, xe(l,2}. 


Show that f is concave on [0, 2]. Observe that x* = 1 is a global maximizer of 


J. Let V be the set of all y that point into [0,2] at x*. Show that Df(x*; y) < 0 
forell ye V. 
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14. Repeat problem 13 if f(x) = x, and if f(x) = x(1 — x). 


15. Describe a set of conditions on the parameters p and / under which the budget set 
B(p, D of the utility-maximization problem of subsection 2.3.1 meets Slater’s 
condition. 

16. Under what conditions on p and w does the constraint set F(p, w) of the 
consumption-leisure choice problem of subsection 2.3.5 meet Slater’s condi- 
tion? 

17. Identify a set of conditions on the parameters p and w of the portfolio choice 
problem of subsection 2.3.6 under which the constraint set ®( p, w) of the prob- 
lem satisfies Slater’s condition. 

18. Find a set of sufficient conditions on the technology g under which the constraint 
set of the cost-minimization problem of subsection 2.3.4 meets Slater’s condition. 


19. Let T be any positive integer. Consider the following problem: 


T 
Maximize Yula) 

f=] 
subjectto Cy +x, < x 
S(m-1), t S Diy T 


Ct, Xt > 0, b=) ERT f 


A 


Cy + XY 


where x € R+, and u and f are nondecreasing continuous functions from Ry 
into itself. Derive the Kuhn—Tucker first-order conditions for this problem, and 
explain under what circumstances these conditions are sufficient. 


20. A firm produces an output y using two inputs x; and x2 as y = ./x|X2. The firm 
is obligated to use at least one unit of x; in its production process. The input 
prices of x, and x2 are given by w; and w2, respectively. Assume that the firm 
wishes to minimize the cost of producing y units of output. 


(a) Set up the firm’s cost-minimization problem. Is the feasible set closed? com- 
pact? convex? 

(b) Describe the Kuhn—Tucker first-order conditions. Are they sufficient for a 
solution? Why or why not? 


21. Describe conditions on f under which the first-order conditions in the following 
optimization problem are sufficient: 


Maximize pf(xy,....X%n) — WX] — >+ WXn 


subjectto x; >0, f=1,...,n. 


22. 


23. 


24. 


25. 


26. 
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A firm has contracted with its ufiion to hire at least L* units of labor at a wage rate 
of w4 per unit. Any amount of additional labor may be hired at a rate of w2 per 
unit, where w; > w2. Assume that labor is the only input used by the firm in its 
production process, and that the production function is given by f: Ry — Ra. 
where f is C! and concave. Given that the output sells at a price of p, describe 
the firm's maximization problem. Derive the Kuhn—Tucker first-order conditions. 
Are these necessary for a solution? Are they sufficient? 


A firm uses two inputs, labor (/) and raw material (m), to produce a single output 
y. The production function is given by y = f(/, m). The output sells for a price 
of p, while labor has a unit cost of w. The firm has in its stock 4 units of the raw 
material m. Additional units may be purchased from the market at a price of c. 
The firm can also sell part or all of its own raw material stock in the market at 
the price c. 


(a) Set up the firm’s profit-maximization problem, and derive the Kuhn—Tucker 
first-order conditions. Describe under what conditions on f, these conditions 
are sufficient. 

(b) Let p = w =m = l, and let f(/,m) = ['Pm'/3, Calculate the firm’s 
optimal choice of actions. 


A firm produces an output y using a single input / through the production function 
y = f(l). The output is sold at a constant price of p. The firm is a monopsonist 
in the labor market: if it offers a wage of w, then it can hire A(w) units of labor, 
where À (0) = 0,A’(w) > Oatall w > 0, andA(w) t oo as w f oo. Assume the 
usual nonnegativity conditions. 


(a) Describe the firm’s profit-maximization problem, and write down the Kuhn- 
Tucker first-order conditions for a maximum. 

(b) Explain under what further“assumptions on f(-) and A(-) these first-order 
conditions are also sufficient. (Specify the most general conditions you can 
think of.) 


An agent who consumes three commodities has a utility function given by 


/ 


1/3 . 
u(x, x2, x3) =x; + min{x2, x3}. 


Given an income of /, and prices of pı, p2, p3, write down the consumer’sutility- 
maximization problem. Can the Weierstrass and/or Kuhn—Tucker theorems be 
used to obtain and characterize a solution? Why or why not? 


An agent consumes two commodities x and y. His utility function is given by 
u(x, y) = x + In(1 + y), where In(z) denotes the natural logarithm of z. The 
prices of the two commodities are given by py > O and py > 0. The consumer 
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has an income of / > 0. Assuming consumption of either commodity must be 
nonnegative, find the consumer’s utility-maximizing commodity bundle. 


A consumer with a fixed income of 7 > 0 consumes two commodities. If he 
purchases q; units of commodity i (i = 1, 2), the price he pays is p;(g;), where 
pi(-) isa strictly increasing C! function. The consumer’s utility function is given 
by u(q1, 92) = Ing, + Ing. 


(a) Describe the consumer’s utility maximization problem. Does the Weierstrass 
Theorem apply to yield existence of a maximum? Write down the Kuhn- 
Tucker first-order conditions for a maximum. 

(b) Under what conditions on p;(-) and p2(-) are these conditions also sufficient? 
(Specify the most general conditions you can think of.) 

(c) Suppose pi(qi) = ./qi and p2(q2) = ./q2. Are the sufficient conditions 
you have given met by this specification? Calculate the optimal consumption 
bundle in this case. 


Quasi-Convexity and Optimization 


The previous chapter showed that convexity carries powerful implications for opti- 
mization theory. However, from the point of view of applications, convexity is often 
also quite restrictive as an assumption. For instance, such a commonly used utility 
function as the Cobb-Douglas function 


a 
U(X], . 624 Xn) = XG oe XQ” 


is not concave unless )*7_, aj < 1. In this chapter, we examine optimization under 
a weakening of the condition of convexity, which is called quasi-convexity. 

Quasi-concave and quasi-convex functions fail to exhibit many of the sharp prop- 
erties that distinguish concave and convex functions. A quasi-concave function may, 
for instance, be discontinuous on the interior of its domain. A local maximum of a 
quasi-concave function need not also be a global maximum of the function. Perhaps 
more significantly, first-order conditions are not, in general, sufficient to identify 
global optima of quasi-convex optimization problems. 

Nonetheless, quasi-concave and quasi-convex functions do possess enough struc- 
ture to be of value for optimization theory. Most importantly, it turns out that the 
Kuhn—Tucker first-order conditions are “almost” sufficient to identify optima of 
inequality-constrained optimization problems under quasi-convexity restrictions: 
more precisely, the first-order conditions are sufficient provided the optimum occurs 
at a point where an additional regularity condition is also met. The result cannot, 
unfortunately, be strengthened to obtain unconditional sufficiency of the first-order 
conditions: it is easy to construct examples of otherwise well-behaved quasi-convex 
optimization problems where the regularity condition fails, and as a consequence, 
the first-order conditions do not suffice to identify optima. However, the regularity 


condition is not a very restrictive one, and is satisfied in many models of economic 
interest. 
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Finally, it must be mentioned that technical expedience is not the only, or even the 
primary, ground for the significance of quasi-convex structures in economic analy- 
sis. As Arrow and Enthoven (1961) demonstrate, the quasi-concavity of a function 
turns out to be, under certain conditions, the precise mathematical expression of the 
economic concept of a diminishing marginal rate of substitution. The latter is an 
assumption that is frequently imposed on economic models, and is usually justi- 
fied by an appeal to economic considerations alone. That such functions also yield 
sufficiency of the first-order conditions can be viewed as simply an added bonus. 


8.1 Quasi-Concave and Quasi-Convex Functions 
Throughout this chapter, D will denote a convex set in R”. Let f:D — R. The 
upper-contour set of f ata € R, denoted U p(y), is defined as 
Usla) = {x ED | f(x) 2 4}, 
while the lower-conrour set of f ata € R, denoted Lp(a), is defined as 
Lela) = {x ED] f(x) <a}. 


The function f is said to be quasi-concave on D if Uf (a) is a convex set for each 
a. Itis said to be quasi-convex on D if Lp(a) is a convex set for each a. 

As with concave and convex functions, it is also true in the case of quasi-concave 
and quasi-convex functions that a strong relationship exists between the value of a 
function at two points x and y, and the value of the function at a convex combination 
Ax + (1 —A)y: 


Theorem 8.1 A function f:D — R is quasi-concave on D if and only if for all 
x,y € Dand for alld e (0, 1), it is the case that 


S{Ax+ (1 -A)y] 2 mint fx), fO). 


The function f is quasi-convex on D if and only if for all x, y € D and for all 
à € (0, 1), itis the case that 


SiaAx+(-—A)y] < max{ f(r), SO). 


Proof First, suppose that f is quasi-concave, i.e., that Uy(a) is a convex set for 
each a € R. Let x, y € D and A € (0, 1). Assume, without loss of generality, that 
f(x) = f(y). Letting f(y) = a, we have x, y € Uy(a). By the convexity of Us (a), 
we have Ax + (1 — A)y € Us(a), which means 


ffax+U1—-A)yl > fo) = a = fiy) = min{ fx), SO). 


\ 
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Now, suppose we have f[Ax + (1 = A) y}] > min{ f(x), f(y))} for all x, y €-D- 
and for all A € (0, 1). Leta € R. If Uy(a) is empty or contains only one point, 
it is evidently convex, so suppose it contains at least two points x and y. Then 
f(x) > aand f(y) > a,so min{ f(x), f(y)} = a. Now, for any à € (0, 1), we have 
S{rAx+CU—A)y] = min{ f(x), /(y)} by hypothesis, and so Ax + (1—-A)y € Uy(a). 
Since a was arbitrary, the proof is complete for the case of quasi-concave functions. 

An analogous argument shows that the claimed result is also true in the case of 
quasi-convex functions. C 


A quasi-concave function {:D — R is said to be strictly quasi-concave if the 
defining inequality can be made strict, i.e., if for all x, y € D with x Æ y, and for all 
à € (0, 1), we have 


Slx+ (1 —A)y) > min{ f(x), O. 


A Strictly quasi-convex function is similarly defined. 

In the sequel, we will use the term “quasi-convexity” to encompass the concepts 
of convex sets, and of quasi-concave and quasi-convex functions. In an obvious 
extension of the terminology we defined in the previous chapter, we will say that 
a maximization problem is a quasi-convex maximization problem if it has a convex 
constraint set and a quasi-concave objective function. A quasi-convex minimization 
problem will, similarly, refer to a minimization problem with a convex constraint 
set and a quasi-convex objective function. A quasi-convex optimization problem will 
refer to a problem that is either a quasi-convex maximization problem, or a quasi- 
convex minimization problem. 

Finally, the following observation, which relates quasi-concave and quasi-convex 
functions, enables us to focus solely on quasi-concave functions, wherever this sim- 
plifies the exposition. Deriving analogous results for quasi-convex functions using 
Theorem 8.2 is then a straightforward task, and will be left to the reader as an exercise. 


Theorem 8.2 The function f:D — R is quasi-concave on D if and only if — f 
is quasi-convex on D. It is strictly quasi-concave on D if and only if — f is strictly 
quasi-convex on D. 


8.2 Quasi-Convexity as a Generalization of Convexity 


It is a simple matter to show that the set of all quasi-concave functions contains the 
set of all concave functions, and that the set of all quasi-convex functions contains 
the set of all convex functions: 


Theorem 8.3 Let f:D C R” > R. If f is concave on D, it is also quasi-concave 
on D. If f is convex on D, it is also quasi-convex on D. 
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Proof Suppose f is concave. Then, for all x, y € D and à € (0, 1), we have 
fax +- Ay = ASELA) 


> Amin{ f(x), fO) + A — A) min{ f(x), SO) 
min{ f(x), f(»)}, 


so f is also quasi-concave. A similar argument establishes that if f is convex, it is 
also quasi-convex. Q 


The converse of this result is false, as the following example shows: 


Example 8.4 Let f: R — R be any nondecreasing function on R. Then, f is both 
quasi-concave and quasi-convex on R. To see this, pick any x and yin R, and any A € 
(0, 1). Assume, without loss of generality, thatx > y. Then, x > Ax+(1—A)y> y. 
Since f is nondecreasing, we have 


Six) > fx+(U-dAy) > SO). 


Since f(x) = max{f(x), f(y)}, the first inequality shows that f is quasi-convex 
on R. Since f(y) = min{ f(x), f(y)}, the second inequality shows that f is quasi- 
concave on R. 

Since it is always possible to choose a nondecreasing function f that is neither 
concave nor convex on R (for instance, take f(x) = x? for all x € R), we have 
shown that not every quasi-concave function is concave, and not every quasi-convex 
function is convex. o 


The next result elaborates on the relationship between concave and quasi-concave 
functions. It also raises a very important question regarding the value of quasi- 
convexity for optimization theory. This is discussed after the proof of the theorem. 


Theorem 8.5 If f: D — R is quasi-concave on D, and ¢:R — R is a monotone 
nondecreasing function, then the composition ġo f is a quasi-concave function from 
D to R. In particular, any monotone transform of a concave function results in a 
quasi-concave function. 


Proof Pick any x and y in D, and any A € (0, 1). We will show that 
pS Dx +- ày) z min(o(f(x)), o(/O))}, 


which will complete the proof. Indeed, this is immediate. Since f is quasi-concave 
by hypothesis, we have 


Jhx+(—à)y]) > min{ f(x), (9). 
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Since ¢ is nondecreasing, this implies 


O(ffAx+(1—-A)yl) > omin f(x), SOD = minj f(x), PSO. 


The proof is complete. a) 


A natural question that arises is whether the converse of the second part of The- 
orem 8.5 is also true, that is, whether every quasi-concave function is a monotone 
transformation of some concave function. This question is significant in hight of The- 
orem 2.5, which states that if o is any strictly increasing function on R, then a point 
x* maximizes a given function f over a given constraint set D if and only if x” 
maximizes the composition g o f over D.! If it turned out that every quasi-concave 
function could be obtained as a strictly increasing transformation of some concave 
function, then any optimization problem with a quasi-concave objective could be 
converted to an equivalent problem with a concave objective. Thus, quasi-convexily 
would have nothing to add to convexity, at least from the point of view of optimization 
theory. 

It turns out, however, that this concern is without foundation: there do exist func- 
tions which are quasi-concave, but which are not obtainable as a strictly increasing 
transformation of a concave function. A particularly simple example of such a func- 
tion is presented in Example 8.1 below. Thus, the notion of quasi-concavity is a 
genuine generalization of the notion of concavity. 


Example 8.6 Let f:R} — R be defined by 
0, x € [0, 1} 


Q@-=1}, x>l 


fœ = 


Figure 8.1 illustrates this function. Evidently, f is a nondecreasing, and therefore 
quasi-concave, function on R4. Note that f is constant in x for x € [0, 1], and is 
strictly increasing in x for x > 1. 

Suppose there existed a concave function g and a strictly increasing function ¢ 
such that p o g = f. We will show that a contradiction must result. 

We claim first that g must be constant on the interval [0, 1]. To see this, suppose 
there existed x and yin {0, 1} such that g(x) Æ g(y), say g(x) > giy). Then, since 
¢ is Strictly increasing, we must also have y(g(x)) > @(g(¥)). This contradicts the 
requirement that y o g be constant on [0, 1]. 

Next we claim that g is strictly increasing in x for x > |. For, suppose there 
existed points x and y such that x > y > 1, and such that g(x) < g(y). Then, 

' Note that Theorem 2.5 requires the transforming function g to be strictly increasing, not just nondecreasing 
Indeed, the equivalence claimed in the theorem would be false if g were only nondecreasing. For instance, 


if o was a constant function, then it would also be nondecreasing, and evidently every point in the domain 
would now maximize y o f over D; clearly, not every point need be a maximizer of f on D. 


208 Chapter 8 Quasi-Convexity and Optimization 


J(z) 


Fig. 8.1. A Quasi-Concave Function That Is Not a Monotone Transform of a Concave 
Function 


p(g(x)) < p(g(y)). This contradicts the requirement that ¢ o g be strictly increasing 
inx forx > 1. 


But if g is constant on the interval (0, 1], it has a local maximum at every z € (0, 1). 
These local maxima are not global maxima, since g is increasing for x > 1. This 
contradicts Theorem 7.13, which shows that every local maximum of a concave 
function must also be a global maximum. o 


Arrow and Enthoven (1961) provide a richer, but also more complicated, example 
to illustrate this point. They consider the function f: R3 — R given by 


Ly) = w DH =x)? +4 y. 
Note that f is a strictly increasing function on RZ, i.e., we have 3f (x, y)/dx > 0 
and 3 f(x, y)/dy > Oatall (x, y) € RZ. Moreover, the “indifference curve” of f at 
k, i.e., the locus of points (x, y) € RŽ such that f(x, y) = k, is the straight line 


4y + (4+2k)x = k? + 2k. 


(This indifference curve is graphed in Figure 8.2.) Since f is strictly increasing, the 
upper-contour set U f(k) consists of all points (x, y) € RZ lying “above” (i.e., to the 
northeast of) this straight line, and so is clearly convex. Thus, f is quasi-concave. 
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Jest aie -- 


{zyl f(r y) = k} 
= {(r, y) My + (4 + 2k)r = k? + 2k} 


Fig. 8.2. The Arrow-Enthoven Example: An Indifference Curve 


To prove that f cannot be a strict monotone transform of a concave function is 
significantly more difficult. We leave it to the reader as an exercise. 


8.3 Implications of Quasi-Convexity 


Quasi-concave and quasi-convex functions do not enjoy many of the properties that 
come with convexity. For instance, unlike concave and convex functions, 


1. Quasi-concave and quasi-convex functions are not necessarily continuous in the 
interior of their domains; 


2. Quasi-concave functions can have local maxima that are not global maxima, and 
quasi-convex functions can have local minima that are not global minima; 

3. First-order conditions are not sufficient to identify even local optima under quasi - 
convexity. 


The following example illustrates all of these points: 


Example 8.7 Let f:R — R be defined by 


x, x € [0,1] 


f= 4 1, x € (1, 2] 
x? xo 2, 
Since f is a nondecreasing function, it is both quasi-concave and quasi-convex 


on R. Clearly, f has a dicontinuity atx = 2. Moreover, f is constant on the open 
interval (1, 2), so every point in this interval is a local maximum of f as well as a local 
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minimum of f. However, no point in (0, 1) is either a global maximum or a global 
minimum. Finally, (0) = 0, although O is evidently neither a local maximum, nor 
a local mimimum. o 


Another significant distinction between convexity and quasi-convexity arises from 
the fact that a strictly concave function cannot be even weakly convex, and a strictly 
convex function cannot be even weakly concave. In contrast, a strictly quasi-concave 
function may well be strictly quasi-convex also: a simple modification of Exam- 
ple 8.4 shows that if f is any strictly increasing function on R (i.e., if x > yim- 
plies f(x) > f(y)), then f is both strictly quasi-concave and strictly quasi-convex 
on R. 

However, as with concave and convex functions, the derivatives of quasi-concave 
and quasi-convex functions do possess a good deal of structure. In particular, it is 
possible to provide analogs of Theorems 7.9 and 7.10, which characterized the con- 
vexity properties of functions using their first- and second-derivatives, respectively. 
Here is the first result in this direction: 


Theorem 8.8 Let f: D > R be a C! function where D C R” is convex and open. 
Then f is a quasi-concave function on D if and only if it is the case that for any 
x,yeD, 


SQ) = fœ) => Df(x)(y-x) 20. 


Proof See Section 8.6. QO 


It is also possible to give a second-derivative test for quasi-convexity along the 
lines of Theorem 7.10 for concave and convex functions. Let a C? function f defined 
on some open domain D C R” be given, and let x € D. Fork = 1,...n, let C(x) 
be the (k + 1) x (k + 1) matrix defined by 


a 
0 Sf a ) 

Ox, OXE 

g2 2 
Fey Say She 

Cex) = x] əxi X10Xk 

af ae ae 
a x) aan (x) re (x) 
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Theorem 8.9 Let f:D — R be aC? function, where D C R” is open and convex. 
Then: 


1. Uf f is quasi-concave on D, we have (— IKiCy(x)| > Ofork =1,..., n. 
2. IDECO > Oforallk € {1,...,n}, then f is quasi-concave on P. 


Proof See Section 8.7. D 


There is one very important difference between Theorem 8.9 and the corresponding 
result for concave functions, Theorem 7.10. In Theorem 7.10, a weak inequality (viz., 
the negative semidefiniteness of D? f) was both necessary and sufficient to establish 
concavity. In the current result, the weak inequality is necessary for quasi-concavily, 
but the sufficient condition involves a strict inequality. The following example shows 
that the weak inequality is not sufficient to identify quasi-convexity, so Theorem 8.9 
cannot be made into a perfect analog of Theorem 7.10.” 


Example 8.10 Let f: RI — R be given by 
fœ, y = -Po -, œ,yeR. 
Then, we have Df (x, y) = (2(x — DO — 1)?, 2(x — 1) (y — 1)), and 


2(y — 1)? 4(x — 1I)(y— 1) 


D? f(x,y) = 
4(x— I(y—1) 2x — 1)? 
Therefore, 
0 2(x ~— 1)(y - 1)? 
Cix, y) = 
2(x — 1)(y - 1)? 2(y — 1)? 
and 
0 X(x -= 1X{y- 1} 2% -)*(y—1) 
Co(x, y) = | 2œ — 1X = 1)? y = 1)? 4(x — I(y— 1) 
2x —1}(p-1) 4&-lXy-!) 2(x — 1)? 


2There is also another difference between the two results. In Theorem 7.10, the strict inequality (the negative 
definiteness of D? f) was shown to be sufficient for strict concavity. In Theorem 8.9, the strict inequality 
is claimed to be sufficient only for quasi-concavity and not for srrict quasi~concavity. We leave it to the 
reader to check whether, or under what conditions, strict quasi-concavity results from the strict inequality. 
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A pair of simple calculations now yields 


Cia, y= 40 -1y - 14 
IC, y)| = 166 - 4-14 


0, (x, y) € RY 
0, œ, y) eR} 


= 
2 


with equality holding in either case if and only if x = l.or y = 1, Thus, if weak 
inequalities were also sufficient to establish quasi-concavity, f would have to be 
quasi-concave. However, f is not quasi-concave: we have f (0,0) = f(2,2) = 1, 
but for À = 1, 


fu, 0) + A —A)2,2)] = fA,1) = 0 < 1 = min{f(0,0), /(2,2)). 


Thus, the sufficient condition of Theorem 8.9 cannot be strengthened to allow for 
weak inequalities. O 


As with the second-derivative test provided by Theorem 7.10 for convexity, the test 
given in Theorem 8.9 is also of considerable practical value. The following example 
illustrates this point. 


Example 8.11 Let f: RZ} — R be defined by 
f(x,y) = xy’, a,b>0. 


We have seen in Example 7.12 that this function is strictly concave on R3, if 
a+b < 1, that it is concave on Ri, ifa +b = 1, and that it is neither concave 
nor convex on R? 4 ifa +b > 1. We will show using Theorem 8.9 that it is quasi- 
concave on R? , for all a, b > 0. Note that to show this directly from the definition 
of quasi-concavity would require us to prove that the following inequality holds for 
all distinct (x, y) and (x, y) in Ri, and for all à € (0, 1): 


(Ax +A Ay + (AD > min{x7y?, 79°}. 


In contrast, if we appeal to Theorem 8.9, we only have to show that [C1 (x, y)| < 0 
and |C2(x, y)| > 0, where 


0 axa! yb 
Ci(x,y) = 
axi yb afa — I)x4~? y” 
and 
0 axa} yb bxs yb-! 
C(x, y) = | axt"y> ala — 1)x°7? y’ abx“7! yb- 


bx yb! abx4~!yb-! b(b — 1)x? yt? 
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A simple calculation shows that |€} (x, y)] = —a2x?@- yP, which is strictly,_ 
negative for all (x, y) € R4 + While 


ICa, y)| = x82 yha” + ba), 


which is strictly positive for all (x, y) € Ri. The quasi-concavity of f on Ro, is 
established. o 


8.4 Quasi-Convexity and Optimization 
We have already seen that local maxima of quasi-concave functions need not, in 
general, also be global maxima. When the function involved is strictly quasi-concave, 
however, a sharp result is valid: 


Theorem 8.12 Suppose f:D —> R be strictly quasi-concave where D C R” is 
convex. Then, any local maximum of f on D is also a global maximum of f on TD. 
Moreover, the set arg max{ f(x) | x € D} of maximizers of f on D is either empty 
or a Singleton. 


Proof Suppose x is a local maximum of f on D, so there exists r > 0 such that 
f(x) = f(y) for all y € B(x, r) N D.If x were not a global maximum of f on D, 
there must be z € D such that f(z) > f(x). Let v(A) = Ax + (1 — A)z. Note that 
y(A) € D because D is convex. By the strict quasi-concavity of f, we have 


SOA) > min{ fx), f(z} = Se), 


for any à € (0,1). But forà > 1—r/d(x,z), d(x, y(à) < r,so pQ) € Bx. nr) ND, 
which contradicts the hypothesis that x is a local maximum, and establishes one part 
of the result. 

To see the other part, let x and y both be maximizers of f on D. Pick any A € (0, 1) 
and let z = Ax + (l ~A)y. Since D is convex, z € D. Since f is strictly quasi- 
concave, f(z) = f(Ax + (1 ~A)y) > min ADSON = f(x) = fy), and this 
contradicts the hypothesis that x and y are maximizers. a 


A similar result to Theorem 8.12 is true for strictly quasi-convex functions in 
minimization problems. It is left to the reader to provide details. The most significant 
part of Theorem 8.12 for optimization is that, as with strict concavity, strict quasi- 
concavity also implies uniqueness of the solution. 

Perhaps the most important result concerning quasi-concave functions in optimiza- 
tion theory is the following. It shows that the Kuhn—Tucker first-order conditions are 
“almost” sufficient to identify the global optimum of an inequality-constrained max- 
imization problem, if all the functions involved are quasi-concave: 
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Theorem 8.13 (The Theorem of Kuhn and Tucker under Quasi-Convexity) Ler 
fand h; (i = 1,...,k) be C! quasi-concave functions mapping U C R” into R, 
where U is open and convex. Define 


D = {EU |hj(x)>0, i=1,..., k} 


Suppose there exist x* € D and à € R* such that the Kuhn—Tucker first-order 
conditions are met: 


k 
[KT-1] Df(x*) + 9 A; Dhy(x*) = 0. 


i=l 


{[KT-2] ài > 0, Ajhj(x*) = 0, f= 1,...,k. 
Then, x* maximizes f over D provided at least one of the following conditions holds: 


[QC-1] Df(x*) #09. 
[QC-2] f is concave. 


Proof See Section 8.8. QO 


lt is very important to note that Theorem 8.13 does not assert that the first-order 
conditions of the Kuhn—Tucker theorem are sufficient under quasi-concavity, unless 
one of the conditions [QC-1] or [QC-2] holds. Indeed, if [QC-1] and [QC-2] both 
fail, it is easy to construct examples where the Kuhn—Tucker first-order conditions do 
notidentify a maximum point, even though all the other conditions of Theorem 8.13 
are met. Consider the following: 


Example 8.14 Let f: R —> R and h: R — R be quasi-concave C! functions given 
by 


x3, x <0 


Jœ) = 0, 0<x<l 
(x-1)?, x>1 


and A(x) = x for all x € R, respectively. Consider the problem of maximizing f 
over the inequality-constrained set D = {x € R | A(x) > 0}. Note that f is not 
concave, so condition [QC-2] does not hold in this problem. 

We claim that for any point x* e [0,1], there is A* > O such that the pair 
(x*,A*) meets the Kuhn—Tucker first-order conditions. At any such point, [QC-1] 
fails, since f’(x*) = O. Indeed, pick any point x* € [0,1] and set A* = 0. Then, 


31n fact, we have established in Example 8.6 that f cannot even be a strictly increasing transformation of a 
concave function. 5 
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since x* € [0,1], we have f'(x*) = 0. Since A* = 0, we also have àA h'(x*) = 0: 
Therefore, 

J'E + UH) = 0, 
and [KT-1] holds. Moreover, it is clearly true that for any x* e [0,1], we have 
h(x*) > O and A*A(x*) = 0, So [KT-2] also holds. Thus, (x*, A*) meets the Kuhn- 
Tucker first-order conditions. 


However, it is evident that no x* € [0, 1] can be a Solution to the problem: since 
J is unbounded on the feasible set R4, a solution to the problem does not exist. 0 


It is also important to note that Theorem 8.13 only provides sufficient conditions 
that identify an optimum. It does not assert these conditions are necessary, and indeed, 
they are not unless the constraint qualification is met at x*, or, as in Theorem 7.16, 
the functions h; are all concave and Slater’s condition holds. 


8.5 Using Quasi-Convexity in Optimization Problems 


Theorem 8.13 is of obvious value in solving optimization problems of the form 
Maximize f(x) subjecttox eD = (ze R” {hj(z) > 0, i =1,...,0}, 


where f and h; are quasi-concave C! functions, i = 1,...,/. A simple three-step 
procedure may be employed. 


1. Set up the Lagrangean L(x, à) = f(x) + £l i Arhi). 
2. Calculate the set of critical points of L, i.e., the set of points (x, 4) at which the 
following conditions hold: 


k 
Df (x) + J. Ai Dhi(x) =0, 


f=] 


hy(x) > 0, A >0, AVAV(x) =O, f=1,...,/. 


Of course, the set of critical points of L is the same as the set of points (x, A) 
that satisfy A(x) > 0 as well as the first-order conditions {KT-1] and [KT-2). 

3. Identify the critical points (x, à) of L which also satisfy Df(x) # 0. At any 
such point (x, A), condition [(QC-1] of Theorem 8.13 is satisfied, so x must be a 
solution to the given maximization problem. 


As an alternative to appealing to condition [QC-1] in step 3, it might be easier 
in some problems to use condition [QC-2] that f is concave. Indeed, it may be 
possible to use [QC-2] even if f is not itself concave, if f happens to at least be 
an increasing transformation of a concave function. In this case, the given problem 
can be transformed into an equivalent one with a concave objective function, and 
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{QC-2] can be used to obtain sufficiency of the first-order conditions in the trans- 
formed problem. Formally, we begin in this case by checking the following condition, 
that we call [QC-2’]: 


(QC-2’] fisa strictly increasing transform of a C concave function f 3 


If [QC-2’] holds, the given optimization problem is equivalent to the following one 
in which the objective function is the concave function /: 


Maximize f(x) subject to x € D = (zéR"[hj(z)>0, i=1,..., 1). 


The concavity of a in this transformed problem implies that condition (QC-2] of 
Theorem 8.13 is met. If we now set up the Lagrangean 


l 
LA) = fx) t Danie), 
i=} 
it follows from Theorem 8.13 that every critical point of i will identify a solution to 
the transformed problem, and, therefore, also to the original problem. 

Of course, if it turns out that [QC-2’] is inapplicable to the given problem, and 
there is also no critical point (x, A) of the Lagrangean L at which Df(x) Æ 0, then 
it is not possible to appeal to Theorem 8.13. As we have seen, it is possible under 
these circumstances that a point may meet the Kuhn-Tucker first-order conditions, 
and still not be a solution to the problem. 


8.6 A Proof of the First-Derivative Characterization of Quasi-Convexity 


First, suppose f is quasi-concave on D, and let x, y € D be such that f(y) > f(x). 
Lett € (0, 1). Since f is quasi-concave, we have 


S(x+t(y—x)) = fC t) + ty) = min f(x), SO = f(x). 
Therefore, it is the case that for all ¢ € (0, 1): 


fœ +t- x) Sx) 5g. 
t 
As t -» O+, the LHS of this expression converges to Df(x)(y — x), so 
Df (x)(x — x) = 0, establishing one part of the result. 

Now suppose that for all x,y e D such that f(y) > f(x), we have 
Df(x)(y — x) > 0. Pick any x,y € D, and suppose without loss of generality 
that f(x) = min{ f(x), f(y)}. We will show that for any ¢ € [0, 1], we must also 
4Note that this condition does not exclude the possibility that f is itself concave. For, if f is itself concave, 


we can Simply let f = f, and take g to be the identity function g(x) = x for all x. Then ø is a strictly 
increasing function, and ÔÊ is concave, and, of course, we now have f = po f. 
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have f{(1 — t)x + ty] = min{ f(x), f(y)}, establishing the quasi-concavity of f. 
For notational simplicity, let z(t) = (1 — t)x + ty. 

Define g(t) = f[x +t(y — x)). Note that g(0) = f(x) < f(y) = g(1); and 
that g is C? on [0, 1] with g(t) = Df [x + 1(y — x)](y — x). We will show that if 
t* € (0, 1) is any point such that f[z(t*)] < f(x) Ge., such that g(r") < 2(0)), we 
must have g'(t*) = 0. This evidently precludes the possibility of having any point 
f e (0, 1) such that gt ) < g(0), and the desired result is established. 

So suppose that t* € (0, 1) and we have f(x) > f{z(t*)}. Then, by hypothesis, 
we must also have D/f(z(¢")](x — z(t*)) = --t*Df{ze*)]}(vy — x) > 0. Since 
t > 0, this implies g’(1*) < 0. On the other hand, since it is also true that f(y) > 
f(x), we have f(y) > f[z(¢*)], so we must also have Df {z(t")}(y — z(t") = 
(l= ¢t*) Df[z(e")](y ~ x) = 0. Since * < 1, this implies in turn that g’(t*) > 0. it 
follows that g’(r*) = 0. g 


8.7 A Proof of the Second-Derivative Characterization of Quasi-Convexity 
We prove Part 1 of the theorem first. Suppose f is a quasi-concave and C? function 
on the open and convex set D C R”. Pick any x e D. If Df(x) = 0, then we 
have |Cx(x)| = 0 for each k, since the first row of this matrix is null. This evidently 
. implies (~—1)*|Cx(x)| > 0 for all k, so Part 1 of the theorem is true in this case. 

Now suppose Df(x) # 0. Define g by g(y) = Df(x)(x — y) for y € R", and 
consider the problem of maximizing f over the constraint set 


Alx] = {ye R" | g(y) = 0}. 


Since the constraint function is linear, A[{x] is a convex set. Since the objective 
function f is quasi-concave by hypothesis, Theorem 8.13 states that any point (y, A) 
meeting the Kuhn—Tucker first-order conditions identifies a solution to the problem, 
provided Df(y) 4 0. We will show that for A = 1, the pair (x, 4) meets the Kuhn- 
Tucker first-order conditions. Since we are assuming that Df(x) # 0, this means x 
is a global maximum of f on A{x]. Indeed, since Dg(y) = —Df (x) at each y, we 
have at (y, A) = (x, 1), 


Df(y) +A Dey) = Df(x)+(—Dfl(x)) = 0. 


Since A = 1, it also satisfies A > O. Finally, at y = x, we have g(y) = 0, so 
Ag(y) = 0. Thus, x is a global maximum of f on A[x]. 
Now define the subset A‘[x] of A[x] by 
Aix) = {ye R" | g0) = 9}. 


Since x € A'[x], and x maximizes f over all of A[x], it also maximizes f over A’[x}. 
Since Dg(x) = Df(x) # 0, we must have p(Dg(x)) = 1. Since the constraint 
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qualification is met, x must meet the second-order necessary conditions provided in 
Theorem 5.4, that is, it must be the case that the quadratic form D? f(x) + AD? g(x) 
is negative semidefinite on the set {z | Dg(x)z = 0}. By Theorem 5.5, a necessary 
condition for this negative semidefiniteness to obtain is that the determinants of the 
submatrices Mg derived from the following matrix by retaining only the first (k + 1) 
rows and columns should have the same sign as (-1f, k=1,... n: 


0 Dg(x) 
M = 
Dg(x)’ D? f(x) + AD? g(x) 


Since D? g(x) = 0, and Dg(x) = — Df (x), the matrix Mx is precisely C; (x). Thus, 
we must have (~1)*|C;(x)| > 0 for k = 2, .. . , n, and this establishes the first part 
of the theorem. : 

The second part of the theorem will be proved using a three-step procedure. Fix 
an arbitrary point x in D, and, as earlier, let g(y) = Df(x)(x — y). 


e InStep 1, we will show that x is itself a strict local maximum of f on the constraint 
set 
A'i] = {ye D| g(y) =0}. 


In Step 2, we will show that x is actually a global maximum of f on the constraint 
set 


Aix] = {ye D| g(y) = 0}. 


Finally, we will show in Step 3 that if y is any other point in D, and A € (0, 1), 


then f[Ax + (1 — A)y} > min{ f(x), f(y)}- Since x was chosen arbitrarily, the 
proof of the quasi-concavity of f is complete. 


So fix x e D. We will show that x is a strict local maximum of f on A’[x] 
by showing (a) that the constraint qualification is met at x, (b) that there is A such 
that (x, A) meets the first-order conditions of the Theorem of Lagrange, and (c) that 
(x, A) also meets the second-order conditions of Theorems 5.4 and 5.5 for a strict 
local maximum. 

First, note that since (—1)*|Cy(x)| > 0 at each k, we cannot have Df (x) = 0. 
Since Dg(y) = —Df(x) atany y € R”, and Df(x) Æ 0, we must have p(Dg(y)) = 
1 atall y, and, in particular, o( Dg(x)) = 1. This establishes (a). Moreover, for A = 1, 
the pair (x, A) is a critical point of the Lagrangean L(y) = f(y) + Ag()y), since 
DL(x) = Df(x) + ADg(x) = Df(x) — Df(x) = 0. This establishes (b). Finally, 
note that since g is linear, we have D? L(x) = D? f(x). From the definition of g, it 
now follows that the condition 


CDEC > 0, k=1,...,7, 
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is exactly the same as the second-order condition required under Theorems-5.4- 
and 5.5. This establishes (c), proving that x is a strict local maximum of f on A'(x], 
and completing Step I. 

We tum to Step 2. Pick any y € A[x]. We will show that f(x) > f(y). Since v 
was chosen arbitrarily in A[x], Step 2 will be complete. Let x(t) = tx + (1 ~1)3, 
and define the function F on [0, 1] by 


F(t) = f(x()). 


Note that F(0) = f(y), FO) = f(x), and that F is C! on [0,1] with F’(t) = 
Df (x(t))(x — y). Let A denote the set of minimizers of F on [0,1], andlets* = inf A. 
We claim that 1* = 0. 

Suppose ¢* Æ 0. We will show the existence of a contradiction. If r* € (0, 1), then 
we must, of course, have F’(r*) = O since ¢* is an interior minimum of F on {0, 1]: 
and this implies we must have 


Df(x(t"))(x -= y) = 0. 


If t* = 1, then we must have F’(t*) < 0, or ¢* could not be a minimum of F. 
Therefore, we must have Df(x(t*))(x — y) < 0. However, at t* = | we also have 
x(t*) = x, and since y is in A[x], we must have g(y) = Df(x)(x — y) > 0. 
Combining these inequalities, we see that in this case also, we must have 


DfEE -= y) = 0. 


Now pick any n € (0, ¢*). Observe that x(t* — n) — x(t*) = —n(x — y), which 
implies that 


Df (x(t"))[x(* — h) — x(t*)] = 0. 


By Step 1, Df(x(t*))[x(t* — h) — x(¢*)] = 0 implies that x(t*) is a strict local 
maximum of f on the constraint set 


A'E] = {ye R" | DARU) — yl = 0}. 


Since x(t" — 17) € A’[x(t*)] forn € (0, 1"), this means that for n > O but sufficiently 
small, we must have f(x(t*)) > f(x(t* —n)). This contradicts the definition of x (¢*) 
as the smallest minimizer of F on [0,1], and shows that ¢* € (0, 1] is impossible. 
Therefore, we must have t* = 0. Step 2 is now complete, since (* = 0 implies 
F(1) 2 F(0), which is the same as f(x) > f(y). 

This leaves Step 3. Pick any y € D and any A € (0, 1), and letz = Ax+(1—A)y. 
Observe that 


Df(z)z = ADf(z)x + (1 -A)Df(z)y > min{ Df(z)x, Df(z)y}. 
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Suppose first that Df(z)x < Df(z)y. Then, Df(z)z > Df (z)x implies 
Df (z)(z — x) > 0. Therefore, x € A[z], where, of course, 


Alz] = {we R" | Df(z)(z— w) > 0), 


By Step 2, z maximizes f over A[z], so we must have f(z) > f(x). 
Now suppose Df(z)y < Df(z)x. Then, we must have Df(z)(z — y) = 0, so 
y € A[z]. Since z maximizes f over A[z], in this case we must have f(z) > f(y). 
Thus, we must have either f(z) > f(x) or f(z) > f(y). This implies that 
J(z) = min{ f(x), f(y)}, so the proof is complete. o 


8.8 A Proof of the Theorem of Kuhn and Tucker under Quasi-Convexity 


We begin the proof of Theorem 8.13 by noting the simple fact that when A; is quasi- 
concave, the set Dj = {x | Aj(x) 2 0} is a convex set, and so then is the feasible set 
D= AKS D,. We now claim that 


Lemma 8.15 Under the hypotheses of the theorem, it is the case that for any y € D, 
we have Df (x*)(y — x*) < 0. 


Proof of Lemma 8.15 By hypothesis, we have 


k 
Df (x*)\(y — x") = — DOA Dhi(x*)(y ~ x"). 
i=l 

The lemma will be established if we can show that the sum on the right-hand side is 
nonpositive. We will prove that this is the case, by proving that foreach i € {1,..., k}, 
itis the case that à; Dh;(x*)(y — x*) > 0. 

So pick any y € D, and any i € {1,..., k}. By definition, we must have h; (x*) > 
0. Suppose first that h;(x*) > 0. Then, we must have 4; = 0 by the Kuhn—Tucker 
first-order conditions, and it follows that 4; Dhj(x*)(y — x*) = 0. Next suppose 
hj(x*) = 0. Since D is convex, (x* + f(y — x*)) = (1 — t)x* + ty) € D for 
all ¢ € (0, 1), and therefore, we have hj(x* + ¢(y — x*)) > 0. It follows that for 
t € (0, 1), we also have 


hj(x* + t(y — x*)) E hj(x* + t(y — x*)) — hi(x*) 
t t , 


0 < 
and taking limits as £ ~ 0+ establishes that Dh; (x*)(y—x*) = 0. Since A; > 0, we 
finally obtain A; Dh; (x*)(y — x*) > 0 in this case also. Thus, for any i € {1,..., k}, 
we must have A; Dij(x*)(y — x*) > 0, and the lemma is proved. o 
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We return now to the proof of Theorem 8.13. We will show that the theorenr ts 
true under each of the two identified conditions, [QC-1] Df(x*) # 0, and [QC-2] f 
is concave. 


Case 1: [{QC-1] Df (x*) #0 
Since Df(x*) # 0, there is w e R” such that Df(x*)w < 0. Let z = x* + w. Note 
that we then have Df(x*)(z — x*) <0. 
Now pick any y € D. Fort € (0, 1), let 


ywt)=(—thy+tz and x(t) = (1 —t)x* +12. 
Fix any £ € (0, 1). Then we have 


Df (x*)(x(t) — x") = Df x" z- x") < 0, 
DfAA — x) = GA — 1) Df*)(y — x*) < 0, 


where the inequality in the first expression follows from the definition of z, and 
that in the second inequality obtains from Lemma 8.15. When these inequalities are 
summed, we have 


Df (x" (y(t) — x") < 0, 


implying, by Theorem 8.8, that f (y(t)) < f(x*). Since this holds for any ¢ € (0, 1), 
taking limits as £ > 1, we have f(y) < f(x*). Since y e D was chosen arbitrarily, 
this states precisely that x* is a global maximum of f on D. 


Case 2: [QC-2] f is concave 


By repeating the arguments used in the proof of the Theorem of Kuhn and Tucker 
under Convexity (Theorem 7.16), it is readily established that Df(x*)y < 0 for all 
y pointing into D at x*. Since f is concave and D is convex, the optimality of x* is 
now a consequence of Theorem 7.20. 


8.9 Exercises 


1. Let f and g be real-valued functions on R. Define the function h: R* — R by 
A(x, y) = fœ) + go). 


Show that if f and g are both strictly concave functions on R, then A is a strictly 
concave function on R?. Give an example to show that if f and g are both 
strictly quasi-concave functions on R, then 4 need nor be a strictly quasi-concave 
function on R?. 


2. Give an example of a strictly quasi-concave function f: R} — R which is also 
strictly quasi-convex, or show that no such example is possible. 
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. We have seen in this chapter that if f:IR —> R is a nondecreasing function, 


then f is both quasi-concave and quasi-convex on R. Is this true if f is instead 
non-increasing? 


. Let fj,... fi be functions mapping D C R” into R, where D is convex. Let 


ay},..., aj be nonnegative numbers. Show that if for each i € {1,...,/}, fi is 
concave, then sois f, where f is defined by 


l 
Sx) = FP afe) xen. 
izi 


Give an example to show that if each f; is only quasi-concave, then f need not 
be quasi-concave. 


. Let f:D — R, where D C R” is convex. Suppose f is a strictly quasi-concave 


function. If f is also a concave function, is it necessarily strictly concave? Why 
or why not? 


. Letg: R” —> Rbequasi-concave,andlet f:R + Rbe a nondecreasing function. 


Show that A(x) = f[g(x)] is also quasi-concave. 


. Show that the “Cobb-Douglas” utility function u: R? — R defined by 


u(x1, x2) = xtx8, a, B > 0, 
(a) isconcave ife + B < 1. 
(b) is quasi-concave, but not concave, ifa + 8 > 1. 


Show also that A(x1, x2) = log(u(x1, x2)) is concave for any value of a > 0 and 
>Q. 


. Show that the function f:D C R? — R defined by f(x, y) = xy is quasi- 


concave if D = R4, but not if D = R?. 


. Give an example of a function u: R? — R such that u is strictly increasing and 


strictly quasi-concave on RZ. {Note: Strictly increasing means that if x > x’ and 
x Æ x’ then u(x) > u(x’). This rules out Cobb-Douglas type functions of the 
form u(x, y) = x? y? for a, b > 0: since u(0, y) = u(x,0) = 0 for all (x, y), 
so u cannot be strictly increasing.] 


. Describe a continuous quasi-concave function u: RA — R such that for all 


c E€ R, the /ower-contour set Ly,(c) = {x € RŻ | u(x) < c} is a convex 
set. Would such an example be possible if u were required to be strictly quasi- 
concave? Why or why not? 


. Aconsumer gets utility not only out of the two goods that he consumes, but also 


from the income he has left over after consumption. Suppose the consumer’s 
utility function is given by u (c1, cz, m) where c; denotes the quantity consumed 
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of commodity i, and m > Q is the left-over income. Suppose further that u is 

Strictly increasing in each of the three arguments. Assuming that the consumer 

has an income of / > 0, that the price of commodity i is p; > O, and that all the 

usual nonnegativity constraints hold, answer the following questions: 

(a) Set up the consumer’s utility maximization problem, and describe the Kuhn— 
Tucker first-order conditions. 


(b) Explain under what further assumptions, if any, these conditions are also 
sufficient for a maximum. 
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Parametric Continuity: The Maximum Theorem 


The notion of a parametric family of optimization problems was defined in Section 2.2 
of Chapter 2. To recall the basic notation, such a family is defined by a parameter 
space © C R’, and the specification for each 9 € © of two objects: a constraint 
set D(@) c R” and an objective function f(-,@) defined on D(@). We will assume 
throughout this chapter that we are dealing with maximization problems; that is, that 
the problem in question is to solve at each @ € ©, 


Maximize f(x) subject to x € D(@). 


Deriving the corresponding results for minimization problems is a routine exercise 
and is left to the reader. As in Chapter 2, we will denote by f*(0) the maximized value 
function which describes the supremum of attainable rewards under the parameter 
configuration 6; and by D* (0) the set of maximizers in the problem at 6. That is, 


S*(@) = sup{ f(x, 9) |x € D@)} 
D*(@) = argmax{ f(x,0) | x € D(@)}. 


We will sometimes refer to the pair ( f*, D*) as the solutions of the given family of 
parametric optimization problems. 

In this chapter, we examine the issue of parametric continuity: under what con- 
ditions do f*(-) and D*(-) vary continuously with the underlying parameters 0? At 
an intuitive level, it is apparent that for continuity in the solutions to obtain, some 
degree of continuity must be present in the primitives f(., -) and D(-) of the prob- 
lem. Being precise about this requires, first of all, a notion of continuity for a map 
such as D(-) which takes points 6 € © into sets D(@) C R”. Our analysis in this 
chapter begins in Section 9.1 with the study of such point-to-set maps, which are 
called correspondences. 


'Note also that the set of solutions D*(@) will not, in general, be single-valued. Thus, the question of when 
this set varies continuously with @ also requires a notion of continuity for point-to-set maps. 
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Section 9.2 is the centerpiece of this chapter. In subsection 9.2.1, we prove orfe of 
the major results of optimization theory, the Maximum Theorem. Roughly speaking, 
the Maximum Theorem states that continuity in the primitives is inherited by the 
solutions, but not in its entirety; that is, some degree of continuity in the problem is 
inevitably lost in the process of optimization. In subsection 9.2.2, we then examine the 
effect of placing convexity restrictions on the primitives, in addition to the continuity 
conditions required by the Maximum Theorem. We label the result derived here, the 
Maximum Theorem under Convexity. We show here that, analogous to the continuity 
results of the Maximum Theorem, the convexity structure of the primitives is also 
inherited by the solutions, but, again, not in its entirety. 

Finally, Sections 9.3 and 9.4 present two detailed worked-out examples, designed 
to illustrate the use of the material of Section 9.2. 


9.1 Correspondences 


Let © and S be subsets of R! and R”, respectively. A correspondence P from © 
to S is a map that associates with each element 6 € © a (nonempty) subset 
(8) C S. To distinguish a correspondence notationally from a function, we will 
denote a correspondence H from © to S by $: © > P(S), where P(S) denotes the 
power set of S, i.e., the set of all nonempty subsets of S. 

In subsection 9.1.1 below, we explore some definitions of continuity for a cor- 
respondence, based on “natural” generalizations of the corresponding definition for 
functions. Some additional structures on correspondences are then provided in sub- 
section 9.1.2. In subsection 9.1.3, the various definitions are related to each other 
and characterized. Finally, the definitions for correspondences and functions are 
compared in subsection 9.1.4. i 


9.1.1 Upper- and Lower-Semicontinuous Correspondences 


Any function f from © to S may also be viewed as a single-valued correspondence 
from © to S. Thus, an intuitively appealing consideration to keep in mind in defining 
a notion of continuity for correspondences is that the definition be consistent with 
the definition of continuity for functions; that is, we would like the two definitions to 
coincide when the correspondence in question is single-valued. Recall that a function 
f{:© — Sis continuous at a point 6 € © if and only if for all open sets V such that 
{(@) € V, there is an open set U containing @ such that for all 0’ € © N U, we have 
f(0') € V. In extending this definition to a notion of continuity for correspondences 
@:© — P(S), a problem arises: there are at least two ways to replace the condition 
that 


SEV 
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with a condition appropriate for correspondences: namely, one can require either that 
oO) CH, 


or that 


OO)NV EG. ) wt dacys 


These two conditions coincide when ® is single-valued everywhere (for, in this case, 
(-) C V ifandonly if ®(-)NV # Ø), but this is evidently not true if ® is not single- 
valued. This leads us to two different notions of continuity for correspondences. 

First, a correspondence $: © — P(S) is said to be upper-semicontinuous or usc 
at a point @ € © if for all open sets V such that $ (0) C V, there exists an open set 
U containing @, such that 6’ € U N © implies ©(6’) C V. We say that ® is usc on 
© if ® is usc at each 0 € O. 

Second, the correspondence ® is said to be lower-semicontinuous or lsc at@ € © 
if for all open sets V such that V N $ (0) Æ Ø, there exists an open set U containing 
8 such that 6’ € U N © implies V N &(6’) # Ø. The correspondence ® is said to be 
lsc on © if it is Isc at each 8 € ©. - 

Combining these definitions, the correspondence ©:© -» P(S) is said to be 
continuous atO € © if > is both usc and lsc at 6. The correspondence ® is continuous 
on © if @ is continuous at each 0 € ©. 

The following examples illustrate these definitions. The first presents an example 
of a correspondence that is usc but not Isc, the second describes one that is Isc but 
not usc, and the third one that is both usc and Isc, and, therefore, continuous. 


Example 9.1 Let © = S = [0, 2]. Define $: © — P(S) by 
üh 0<6<1 
S, 1<6 <2. 


p0) = 


This correspondence is graphed in Figure 9.1. It is not very difficult to see that ® 
is both usc and Isc at all @ 4 1. For, suppose 6 < 1. Lete = (1 — 0)/2. Let U be 
the open interval (0 — €, 0 + €). Then, for all 0’ € U N ©, we have $(6’) = (6). 
Therefore, if V is any open set such that ® (8) C V, we also have O(6') C V for 
all 0’ € U N O; while if V is any open set such that V N ®(@) # Ø, we also have 
V N 0(6') Æ Ø for all 0’ e UNO. Therefore, ® is both usc and Isc atallO < 1. A 
similar argument establishes that it is both usc and Isc at all @ > 1. 

At@ = 1, if V is any open set containing ®(1) = [0, 2], then Y contains $(6’) 
for all 6’ € ©, so @ is clearly usc at 0 = 1. However, & is not Isc at 6 = 1. To see 
this, consider the open interval V = G, 3). Clearly, V has a nonempty intersection 
with ®(1), but an empty intersection with ® (0) for all @ < 1. Since any open set 
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Awe wn 


Fig. 9.1. A Correspondence That Is Usc, But Not Lsc 


0 l 2 


Fig. 9.2. A Correspondence That Is Lsc, But Not Usc 


U containing @ = 1 must also contain at least one point 6’ < 1, it follows that for 
this choice of V, there is no open set U containing @ = 1 which is also such that 
©’) V £@ forall 6’ € UNO. Therefore, ® is not Isc at 8 = 1. a 


Example 9.2 Let © = S = (0, 2]. Define ®:© > P(S) by 


IA 


l 
2: 


{1}, O<@6 
7 T. 1<0 


(0) 


IA 


A graphical illustration of ® is provided in Figure 9.2. The same argument as in the 
earlier example establishes that ® is both usc and Isc at all @ # 1. At@ = 1, P(A) 
can have a nonempty intersection with an open set V if and only if | € F. Since we 
also have 1 € (6’) for all 8’ € O, it follows that ® is Isc at 1. 

However, ® is not usc at 0 = 1: the open interval V = G, 3) contains P(1), but 
fails to contain ®(0’) for any @’ > 1. a 
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K 


See, e 
8 


Fig. 9.3. A Continuous Correspondence 


Example 9.3 Let © c R! and S c R" be arbitrary, and let K be any subset of S. 
Define $: © — P(S) by 


PO@)=K, 0E9. 


Figure 9.3 presents this correspondence in a graph. A correspondence such as $ is 
called a “constant-valued” or simply a “constant” correspondence. Pick any open set 
V,and any @ € ©. Then, 


e 0(8) CF if and only if ®(6’) c V forall@’ € ©, and 
e (0) NV Ø if and only if (6’) £ G for all 6’ € ©. 


It follows that a constant correspondence is both usc and Isc at all points in its domain, 
a) 


9.1.2 Additional Definitions 
In this subsection, we define some additional concepts pertaining to correspondences. 
These are extremely useful in applications, but are also of help in understanding and 
characterizing the definitions of semicontinuity given above. 
Solet © c R” and S c R!. A correspondence ®: © —> P(S) is said to be 


l. closed-valued at 8 € © if $ (8) is a closed set; 
2. compact-valued at 0 € © if D (9) is a compact set; and 
3. convex-valued at 0 € © if (8) is a convex set. 


If a correspondence ® is closed-valued (resp. compact-valued, convex-valued) at all 
6 € O, then we will simply say that ® is a closed-valued (resp. compact-valued, 
convex-valued) correspondence. 

The graph of a correspondence ®: © — P(S), denoted Gr(®), is defined as 


Gr(®) = {(0,5) EO x S| s e O(8)}. 
Note that Gr() is a subset of R” x R’, since © c R” and Sc R'. 
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The correspondence © is said to be a closed correspondence or a closed-graph 
correspondence if Gr(®) is a closed subset of R” x R’, that is, if it is the case that 
for all sequences (6,,} in © such that 6,, > 6 € ©, 


Sm € P(6m), Sm > s implies s € (8). 


Similarly, is said to have a convex graph if Gr(®) is a convex subset of IR” x R’. 
i.e., if for any 6 and 6’ in O, and any s € (0) ands’ € $ (9°), it is the case that 


As + (1—A)s’ € DM8 + (1 —1å)0'], A € (0,1). 


Note that a closed-graph correspondence is necessarily closed-valued, but the 
converse is not true. Similarly, every correspondence with a convex graph 1s also 
convex-valued, but the converse is again false. Here is an example illustrating both 
points: 


Example 9.4 Let © = T = (0, IJ, and define © by 


{9}, O<é<1 
{0}, @=1. 


Clearly, ©(@) is compact and convex for each 0 € ©, so ® is both convex-valued 
and compact-valued, It does not have a closed graph since the sequence [8m, Sm) 
defined by Om = Sm = 1—- i is in the graph of ©, but this sequence converges 
to (1, 1), which is not in the graph of p. Nor does ® have a convex graph since 
(8,5) = (0,0) and (6’, s) = (1,0) are both in the graph of ®, but the convex 
combination G 0) = 100, s) + 40, s’) is not in the graph of ®. QO 


O(0) = 


9.1.3 A Characterization of Semicontinuous Correspondences 


We present in this subsection a number of results that characterize semicontinuous 
correspondences. The discussion here is based on the excellent summary of Hilden- 
brand (1974). 

Some preliminary definitions will help simplify the exposition. Given aset X c R* 
and a subset A of X, we will say that A is open in X if there exists an open set V 
in R* such that A = X OU. We will also say that the subset B of X is closed in 
X if the set {x € X | x ¢ B} is open in X. (The set {x € X | x ¢ B} is called the 
complement of B in X. Abusing notation, we will denote this set by B°.) Thus, for 
instance, the set [0, a) is open in the closed unit interval [0, 1] for any a € {0, 1), 
while the set (0, 1) is closed in itself. 

Let a correspondence $: © — P(S) be given, where © C R” and $ C R’. Let 
W be any set in R’. Define the upper inverse of W under ®, denoted o;! (W), by 


PT (W) = PEO] PO) CH}, 
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and the lower inverse of W under ®, denoted simply $7! (W), by 
DIW) = GEOL DONW £9}. 


Observe that the definitions of lower- and upper-inverses coincide with each other, 
as well as with the definition of the inverse of a function, when the correspondence 
is Single-valued. 


Proposition 9.5 The following conditions are equivalent: 


1. @ is usc on ©. 
2: o;! (G) is open in © for every G that is open in S. 
3. $7! (F) is closed in © for every F closed in S. 


Proof We prove the following series of implications: 
l=2>3>5251. 


So suppose first that Condition | holds. If Condition 2 fails, there is an open set G 
in S such that the set p7! (G) is not open in ©. Therefore, there is 9 € o;! (G) such 
that for each € > 0, there is 0’(€) € B(s, €) N © such that ©(6'(€)) is not contained 
in G. But then ® is not usc at 8, so Condition 1 also fails. 

Next we show that Condition 2 holds if and only if Condition 3 holds. Given any 
two sets E and F, define E — F to be the set E N F°, that is E — F is the set of 
elements of Æ that are not in F. Now, for all A C T, 


PIT- A) = (@€ © |) CT— A} 
= {(@EO|PO)NA=H} 
=0- {EOI PO@)NA FD} 
=@-7|(A). 
The desired result obtains since a set H is open in © (resp. S) if and only if © — H 
is closed in © (resp. S — H is closed in S). 
It remains to be shown that Condition 2 implies Condition 1. Suppose Condition 2 
holds. Let @ € ©, and let V be any open set containing ©(@). By Condition 2, the 
set W = p7’ (V) is open in ©, W is nonempty, since 6 € W. Therefore, if U is any 


open set such that UN © = W, wehave 0 e U and 0(6’) c V forall 8’ €e UNO, 
which states precisely that ® is usc at 0. Since @ was arbitrary, we are done. o 


Proposition 9.6 The following conditions are equivalent: 
1. @ is lsc. 


2Such a set must exist since W is open in ©. 
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2 p7’ (F) is closed for every F that is closed in S. 5 
3. &—!(G) is open for every G that is open in S. 


Proof This result may be proved in a manner similar to the proof of Proposition 9.5. 
The details are left as an exercise. a 


Proposition 9.7 Let $:© —> P(S) be a compact-valued, usc correspondence. 
Then, if K C © is compact, so is ®(K) = {t € T | t € ®(@) for some 8 € K}. That 
is, compact-valued usc correspondences preserve compactness. 


Proof We will use the characterization of compactness provided in Theorem 1.36 
of Chapter 1, that a set E C R" is compact if and only if every open cover of E has 
a finite subcover. 

Let (Va)aea be an open cover of ®(K). Pick any 8 € ©. Then, (9) C O(K) C 
Uae AV. Since ©(6) is compact by hypothesis, there exists a finite subcollection 
Bo C A such that (0) C UgeB Vg. Let Vo = Upe sy Vp. As the union of open sets. 
Vo is open. 

By Proposition 9.5, the set Wg = 5 | (Ve) is open in ©. Trivially, @ € Wa. 
Therefore, we have K C Uge x Wo, which means that Ugex Wo is an open cover of 
the compact set K. Therefore, there exist finitely many indices 6), ..., 07 € K such 
that 


KC UKo = YaF Va). 


Therefore, the collection (Vo, 4 is an open cover of ®(K). Since each Vo, contains 
only finitely many elements of (Vg)aca, it follows that (Vo, yy is a finite subcover 
of the open cover (Vq)weA- 

We have shown that an arbitrary open cover of ®(K) must have a finite subcover, 
which is precisely the statement that ®(K) is compact. o 


Proposition 9.8 Let >: © — P(S) be acompact-valued correspondence. Then, ® 
is usc at 6 € © if and only if for all sequences 6) > 9 € © and for all sequences 


Sp E€ P(Gp), there is a subsequence Skip) Of Sp such that Skip) converges to some 
s € D(@). 


Proof Suppose ® is usc at@ € ©. Suppose also that 6, —> 6, and sp € Ẹ (0p) for all 
p. We are required to show that (i) there exists a convergent subsequence s;(p) of Sp, 
and (ii) the limit s of this subsequence is in $ (8). The set K = {9} UO}, 62, 03, .. is 
clearly compact. Therefore, by Proposition 9.7, so is ®(K). Hence, {sp} is a sequence 
in a compact set, and posseses a convergent subsequence, denoted (say) 54, p): This 
proves (i). Let s = limp-+00 Sk(p)- If s ¢ (8), then there is a closed neighborhood 
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G D $(8) such that s ¢ G. But (6p) C G for sufficiently large p, so Sk(p) € G, 
and since Sk(p) —> t, we must have s € G after all, a contradiction. This proves (ii). 

Conversely, suppose that for all 8p —> s and for all sp € O(6,), there is Skip) 
converging to some s € &(@). Suppose also that ® is compact-valued. Suppose P 
is not usc at some 0 € O, that is, there exists an open set V containing @(@) such 
that for all open sets U containing 6, there exists 6’ € U N © such that ©(6’) is not 
contained in V. Form = 1,2,..., let Um be the open ball B@, 1) around @, and let 
Öm E Um N O be such that ® (8m ) is not contained in V. Pick any point Sm € ® (6m) 
with Sm ¢ V . By construction, it is the case that Om —> @. Since Sm € ® (0m) for each 
m, it is true by hypothesis that there is a subsequence Skim) Of Sm and s € &(8) such 
that Skim) —> S. But Sm ¢ V for each m, and V is open. Therefore, s ¢ V, which 
contradicts the assumption that ®(@) C V. a 


Proposition 9.9 Suppose the correspondence ®:© —> P(S) is lsc at 0, and s € 
D(6). Then, for all sequences Om —> 9, there is Sm € (Om) such that Sm —> s. 

Conversely, let ®:© — P(S), and let9 € ©, s e (6). Suppose that for all 
sequences Om —> 0, there is a subsequence k(m) of m, and spam) € ®(Ok(m)) such 
that Skim) —> S$. Then, ® is lsc at 0. 


Proof Suppose ® is lsc at 8. Let s € ® (0), and let {An} be a sequence in © such 
that 6,, —> 0. For p = 1,2,..., let B(s, 5) denote the open ball in R! with center s 


and radius A Since ® is lsc at 8, it is the case that for each p, there exists an open 


set Up containing 0 such that ® (8°) N B(s, D # Ø for each 8’ € Up N ©. Since 
Om —> 0, itis the case that for each p, there exists m(p) such that Om € Up for all 
m > m(p). Obviously, the sequence m(p) can be chosen to be increasing, i.e., to 
satisfy m(p + 1) > m(p) for all p. Now, if m is such that m(p) <m < m(p+!1), 
define Sm to be any point in the set ® (8m) O B(t, 5) Since m > m(p) implies 
Om € Up, it follows from the definition of U, that this intersection is nonempty. By 
construction, the distance between Sm and s goes to zero as m — oo. This proves 
the first result. 

Now suppose that for all s € ®(@) and for all sequences Om —> @, there is a 
subsequence {Ox¢my} Of {8m} and Skim} E $ (Okem)) Such that Skem) —> s. Suppose $ 
is not Isc at @. Then, there is an open set V such that V N (0) # Ø, and such that 
for all open sets U containing 0, there is 6’ e UN @ such that OP(O) NV = Ø. For 
p=1,2,...,letU, be the open ball B(6, D with center 0 and radius > Let @, € Up 
be any point such that ®(0p)N V = Ø. Pick any s € ®(@) N V . By hypothesis in this 
case, there is a subsequence k(p) and Sk(p) € ®(Okcp)) Such that Skep) —> s. Since 
V isopen, and s € V, it must be the case that sz{p) € V for all p sufficiently large. 
But this contradicts the definition of 6, as a point where ®(6,) NV = Ø. g 
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We close this section with an important warning. H is not uncommon in thè eto- 
nomics literature to define an upper-semicontinuous correspondence as one with a 
closed graph. Such a definition is legitimate provided the range space S of the cor 
respondence is compact, and the correspondence itself is compact-valued, for, as 
Proposition 9.8 then shows, a correspondence is usc if and only if it has a closed 
graph. However, in general, the link between usc and closed-graph correspondences 
is weak: it is not the case that usc correspondences necessarily have closed graphs, 
nor is it true that all closed-graph correspondences are usc (indeed, even if they are 
also compact-valued). Consider the following examples: 


Example 9.10 (An upper-semicontinuous correspondence that does not have a 
closed graph) Let © = S = [0,1], and let ®(@) = (0,1) for all 9 € ©. Then, 
® is usc (and, in fact, continuous) on © since it is a constant correspondence, but 
Gr(®) is not closed. Indeed, ® is not even closed-valued. 0 


Example 9.11 (A compact-valued, closed-graph correspondence that is not usc) 
Let © = S = R4. Define $: © — P(S) by 


{0}, 0=0 


P= {0, l/s}, @>0. 


It is easily checked that Gr() is a closed subset of Ri. Evidently, ® is also a com- 
pact-valued correspondence. However, ® is not usc at 8 = 0: if V is any bounded 
open set containing (0) = {0}, then V cannot contain ®(@) for any 8 which is 
positive, but sufficiently close to 0. D 


9.1.4 Semicontinuous Functions and Semicontinuous Correspondences 


We have mentioned above that when ® is single-valued at each 9 € ©, it can be 
viewed as a function from © to S. This section explores further the relationship 
between the continuity of a single-valued correspondence, and the continuity of 
the same map when it is viewed as a function. The following preliminary result is 
immediate from the definitions, and was, indeed, used to motivate the concept of 
cemicentinuity. 


Theorem 9.12 A single-valued correspondence that is semicontinuous (whether 
usc or lsc) is continuous when viewed as a function. Conversely, every continuous 
function, when viewed as a single-valued correspondence, is both usc and Isc. 
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There are notions of upper- and lower-semicontinuity for functions also, and some- 
what unfortunately, the terminology is misleading.? We elaborate on this below. 
A function f: D c R” > R is said to be upper-semicontinuous or usc at x € D 
if for all sequences x, —> x, limsupy_,o5 (xk) < f(x). The function f is said to 
be lower-semicontinuous or lsc at x if — f is usc at x, i.e., if lim infgog f (xk) > 
F(x), OR mwe (fence) &D “Wey oan wt 2 heed y |y © bow 
A semicontinuous (usc or Isc) function need not be a continuous function; indeed, sents: 
a function is continuous (if and) only if it is both usc and Isc. The importance of 4% < 
semicontinuous functions for optimization theory lies in the following generalization É% x 
of the Weierstrass Theorem: 


Theorem 9.13 (Generalized Weierstrass Theorem) Let D C R" be compact and 
IDR. 


1. Uf f ts usc on D, it attains its supremum on D,i.e., there is zy € D such that 
SER 2 f(x) for allx € D. 

2. If f is Isc on D, it attains its infimum on D, i.e., there exists za € D such that 
f(z2) < f(x) forallx eD. 


Proof Suppose f is usc on D. Leta = sup f(D). By definition of the supremum, 
there must exist a sequence {ap} in f(D) such that ap — a. Since ap € f(D) 
for each n, there must be xp € D such that f(xp) = ap, and f(x») — a. Since 
D is compact, there must exist a subsequence x;(p) Of xp and z; € D such that 
Xk(p) — zı. Obviously, f(xk(p)) > a, and since f is usc, we must have f(z1) > 
lim sup, 5.96 SF (xk py) = a, so by the definition of a, f (z1) = a, which establishes 
Part 1. Since f is usc if and only if — f is Isc, Part 2 is also proved. o 


Since a continuous function is both Isc and usc, it easily follows that a correspon- 
dence which is single-valued and usc is both usc and Isc, when viewed as a function. 
However, a function which is only usc but not Isc (or only Isc and not usc) is neither 
Isc nor usc when viewed as a single-valued correspondence. This is quite easy to see. 
Suppose that f were, for instance, usc, but not Isc. Then, there must exist a point x 
and a sequence xp —> x such that f(x) > limp f(xp). Lete = f(x) — limp S (Xp). 
Take the open ball (f(x) — €/2, f(x) + €/2) around f(x). This open ball does not 
contain f(xp) for any large n, so, viewed as a single-valued correspondence, f is 
neither Isc nor usc. 


Perhaps on this account, it is not uncommon in the economics literature to use the adjective “hemicontin- 
uous” when referring to correspondences, and to reserve “semicontinuous” for functions. Since very little 
of this book involves semicontinuous functions, we continue to use “semicontinuous” in both cases. 
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9.2 Parametric Continuity: The Maximum Theorem 7 


This section describes the two main results of this chapter. In subsection 9.2.1, we 
present one of the fundamental results of optimization theory, the Maximum Theo- 
rem. This is followed in subsection 9.2.2 by an examination of the problem, when 
additional convexity conditions are imposed on the family of optimization problems.4 
Throughout this section, it will be assumed that © C R’ and S C R”, where / and n 
are arbitrary positive integers. 


9.2.1 The Maximum Theorem 


The Maximum Theorem is essentially a statement that if the primitives of a parametric 
family of optimization problems possess a sufficient degree of continuity, then the 
solutions will also be continuous, albeit not to the same degree. 


Theorem 9.14 (The Maximum Theorem) Let f:S x © —> R be a continuous 
function, andD:© —> P(S) be a compact-valued, continuous correspondence. Let 
f*: © — Rand D*: © — P(S) be defined by 


f*@) = max{ f(x, 4) | x € D(6)} 
D* (0) = argmax{ f(x, 0) | x € D@)} = {x € D@) | f(x, 8) = /*(8)}. 


Then f* is a continuous function on ©, and D* is a compact-valued, upper-semi- 
continuous correspondence on ©. 


Proof We will appeal to Propositions 9.8 and 9.9. So let 0 € ©, and let 6,, be a 
sequence in © converging to 8. Pick xm € D* (9m). By Proposition 9.8, there exists a 
subsequence xx(m) Of Xm such that xXk(m) —> x € D(0). The theorem will be proved if 
we can show that f(x, 6) = {*(6), for this will establish not only the continuity of 
J* at 6, but also (by Proposition 9.8) the upper-semicontinuity of the correspondence 
D* at 6. 

Since f is continuous on © x S, it is the case that f (Xk(m)» 9x(my) —> f(x, 8). Sup- 
pose there were x* € D(@) such that f(x*,@) > {(x, 0). We show a contradiction 
must result. 

Since D is Isc at 0, x* € D(@), and xx¢my > x, there is, by Proposition 9.9, a 
further subsequence /(m) of k(m) and Xm) € D(@ym)) such that Xim) > x H 
follows from the continuity of f that 


litt, fim)» 8i) = f(x*,0) > f(x,8) = mm S Olm) lim). 


4The mathematics literature on parametric optimization contains a large number of related results, which 
examine the consequences of weakening or otherwise altering the hypotheses of the Maximum Theorem. 
For details, the reader is referred to Berge (1963) or Bank, ct al. (1983). 
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But this means that for all sufficiently large m, we must have 
S Oimn) io) > fkm) imn), 

and therefore, Xim) ¢ D* (@(my), a contradiction. 0 

Itis important to reiterate that although we assume full continuity in the primitives 
of the problem, the Maximum Theorem claims only that upper-semicontinuity of the 
optimal action correspondence will result, and not lower-semicontinuity. This raises 
two questions. First, is it possible, under the theorem’s hypotheses, to strengthen 
the conclusions and obtain full continuity of D*? Alternatively, is it possible to 
obtain the same set of conclusions under a weaker hypothesis, namely, that D is only 
upper-semicontinuous, and not necessarily also Jower-semicontinuous, on ©? The 


following examples show, respectively, that the answer to each of these questions is 
in the negative. 


Example 9.15 Let © = (0, 1] and S =[1, 2]. Define f: S x © + Rand D: © > 
P(S) by i 
f, = x9, = (x, VES xO 


and D(0) = [1, 2] for all @ € ©. Note that D(-) is continuous and compact-valued 
on ©, while f is continuous on S x ©. 
For 8 > 0, f is strictly increasing on S, so we have 


D*(6) = {2}, 8 > 0. 
At 8 = 0, however, f is identically equal to unity on D(@), so 
D*(@) = [1,2], 0 =0. 


Clearly, D*(@) is usc, but not Isc, at 0 = 0. g 


Example 9.16 Let S = © = [0, 1}. Define f: S x © — R and D: © —> P(S) by 


S(œ,0) =x, (x,a)E Sx O, 
and 
0,11, @=0, 
Die) = 0,1) 
{0}, @>0. 


Trivially, f is continuous on S x ©. The correspondence D(-) is continuous and 
compact-valued everywhere on © except at 0 = 0, where it is usc, but not Isc. We 
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have Ese Ee 22 ee ee 
ay ke 1, @=0, 
PON 0, @>0. 
D6) = bat 6 =0, 
{0}, 9>0. 


Both conclusions of the Maximum Theorem fail: f* is not continuous at 8 = 0, and 
D* is not usc (or, for that matter, Isc) at @ = 0. o 


Finally, a small note. The joint continuity of f in (x, 8), i.e., the continuity of f 
on S x ©, is important for the validity of the Maximum Theorem. This hypothesis 
cannot be replaced with one of separate continuity, i.e., that f(-, 8) is continuous 
on S for each fixed 6, and that f(x, -) is continuous on © for each fixed x. For an 
example, see the Exercises. 


9.2.2 The Maximum Theorem under Convexity 


The purpose of this subsection is to highlight some points regarding the Maximum 
‘Theorem under convéxity restrictions on the underlying problem. We retain the no- 
tation of the previous subsection. We will prove the following result: 


Theorem 9.17 (The Maximum Theorem under Convexity) Suppose f is a con- 


tinuous function on S x © and D is a compact-valued continuous correspondence 
on O. Let 


S*(8) = max{ f(x, @) | x € D(6)} 
D*(9) = argmax{ f(x,9) |x € D()} = {x € D0) | f(x, 0) = f*(6)). 


Then: 


1. f* is a continuous function on ©, and D* is a usc correspondence on ©. 

2. If f(, 8) is concave inx for each 6, and D is convex-valued (1.e., D(O) isa convex 
set for each 6), then D* is a convex-valued correspondence. When “concave” is 
replaced by “strictly concave,” then D* is a single-valued usc correspondence, 
hence a continuous function. 

3. If f isconcave on Sx ©, and D has a convex graph, then f* is aconcave function, 
and D* is a convex-valued usc correspondence. If “concave” is replaced by 
“strictly concave,” then f* is also strictly concave, and D* is single-valued 
everywhere, and therefore, a continuous function. 
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Proof Part 1 is just the statement of the Maximum Theorem. To see Part 2, suppose 
x, x € D*(0). Let x’ = Ax + (1 —A)x for some A € (0, 1). Since D(@) is convex, 
x’ € D@). Then, f 


f(x.) = fx + (1 -A)z, 0] 
z Af(x, 9) + (1 - AFH, 8) 
= Af*(@) + (L-AVI*() 
= f°), 


so, by the definition of f*, we must also have x’ € D*(s). If f is strictly concave, 
then, since D(@) is a convex set, f(-, 0) has a unique maximizer, so D* must be 
single-valued. This completes the proof of Part 2. 

Finally, to see Part 3, let 6, 6 € @,andlet6’ = 46+ (1 — 6 for some A € (0, 1). 
Pick any x € D*(@), and x € D(O). Let x’ = Ax + (1 ~—A)x. Then, since x € D(0) 
and x € D(6), and D(-) has a convex graph, we must have x’ € D(6’). Since x’ is 
feasible, but not necessarily optimal at 6’, we have by the concavity of f, 


S°O) = £09 
= f(Ax + (1 —A)¥,A0 + (1 — A)8) 
> ASX, O)+(L-A SE, 4) 
= Af*(0) + (1— A) f*®). 


This establishes the concavity of f*. If f is strictly concave, then the second in- 
equality in this string becomes strict, proving strict concavity of f*. o 


Once again, simple examples show that the result cannot be strengthened. For 
instance, Part 2 of the theorem assumes that f is concave and D is convex-valued; 
while the convex-valuedness of D is inherited by D*, the theorem makes no claim 
regarding the concavity of f*. In fact, the hypotheses of Part 2 are insufficient to 
obtain concavity of f*; the stronger assumption made in Part 3 that D has a convex 
graph is needed. Here is an example: 


Example 9.18 Let S = © = (0, 1]. Define f: S$ x © + R and D:© > P(S) by 
f(x, 0) = x for all (x, 0) € Sx ©, and D@) = (0, 62]. Then, f is concave on S for 
each fixed @,° and D(-) is convex-valued for each 6, Note, however, that D(-) does 
not have a convex graph. Since f is strictly increasing on S for each 6, we have 


D*(0) = {6°}, @€9, 


Sin fact, f is concave on S x ©. 
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and, therefore, 
r@=0, 060. 
Clearly, f* is not concave on O. (J 
Similarly, although Part 3 of the theorem assumes that D(-) has a convex graph, the 


theorem does not claim that D* also has a convex graph. As the following example 
reveals, such a claim would be false. 


Example 9.19 Let S = © = [0, 1]. Define f: 5 x © — R and D: 90 > P(S) by 


{[@.9 = Vx, x,OESxO, 


and 


I 


DW) = (0, Ve], 9€9. 
Then, f is concave on S x ©, and is, in fact, strictly concave on S for each fixed 0, 


while D(-) has a convex graph. Since f is strictly increasing on S for each fixed 6, 
we have 


D*(9) = {V6}, GEO. 


The graph of D*(-) is clearly not convex. (8 


The point of the examples of this subsection—as also of the examples of the 
previous subsection—is simply that it is not possible to obtain all the properties of 
the primitives in the solutions: inevitably, some properties are lost in the process of 
optimization itself. 

Finally, we note that since the maximization of quasi-concave functions over con- 
vex sets also results in a convex set of maximizers, and since strictly quasi-concave 


functions also have unique maxima, we have the following obvious corollary of 
Theorem 9.17: 


Corollary 9.20 Let f:S§S xO —> R be continuous, and D: © —> P(S) be continuous 
and compact-valued. Define f* and D* as in Theorem 9.17. 


1. Suppose f(-, 0) is quasi-concave in x for each 0, and D is canvex-valued on ©. 
Then, D* is a convex-valued usc correspondence. 


2. If “quasi-concave” is replaced by “strictly quasi-concave,” D* is single-valued 
everywhere on ©, and hence defines a continuous function. 
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9.3 An Application to Consumer Theory 


This section is devoted to examining an application of the Maximum Theorem and the 
Maximum Theorem under convexity to the problem of utility maximization subject 
to a budget constraint. The objective is to examine under what conditions it is that 
maximized utility varies continuously with changes in underlying parameters (i.e., 
the prices and income), and how optimal demand varies as these parameters change. 

There are / commodities in the model, which may be consumed in nonnegative 
amounts. We will assume that the price of each commodity is always strictly positive 
so Ry will represent our price space. Similarly, income will also be assumed to be 
strictly positive always, so R44 will represent the space of possible income levels. 
Define O = R’ 4 x Ry. The set © will serve as our parameter space; a typical 
element of © will be represented by (p, /). 

Let S = R! be the set of all possible consumption bundles. Let u: $ > R be a 
continuous utility function. The budget correspondence B:© —> P(S) is given by: 


Bip, D = (x eR} | px <1). 

Now, define the indirect utility function v(-) and the demand correspondence x(-) 
by: 

u(p, 1) = max{u(x) | x € B(p, 1)} 

x(p, I) = {x € B(p, I) | u(x) = v(p, I}. 
The function v(-) and the correspondence x(-) form the objects of our study in this 
section, We proceed in two steps. First in subsection 9.4.1, we establish that the 
budget correspondence B is a compact-valued continuous correspondence on ©. 


Then, in subsection 9.4.2, we examine the properties of v and x in the parameters 
(p, 1), especially when additional convexity restrictions are placed on the primitives. 


9.3.1 Continuity of the Budget Correspondence 


We will prove the following result: 
Theorem 9.21 The correspondence B:© — P(S) is a continuous, compact-valued 
and convex-valued correspondence. 


Proof Compact-valuedness is obvious, since prices are assumed to be strictly pos- 
itive. Convex-valuedness is also obvious. The continuity of B is established through 
the following lemmata: 


Lemma 9.22 B: © — P(S) is usc. 
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Proof Let ¥ c R! bean open set such that B(p, 1) C V . Define an €-neighborhood - 
Ne(p, 1), of (p, I) in © by 


Nep. D = (P, T) ER x Rag |p pt = i< e. 


Suppose B is not usc at (p, /). Then, for alle > O, there is (p’, 7’) in Ne(p, 1) such 
that B(p’, T’) is not contained in V, i.e., there exists x’ such that x’ € BP. I), 
x’ ¢ V. We will show a contradiction must occur. 

Choose a sequence e(n) — 0, and let (pn, In) E€ Nein (p, 1), with xn E€ BC pn, In) 
but x, ¢ V. We will first argue that the {x,,} sequence contains a convergent subse- 
quence, by showing that the sequence lies in a compact set. 


Since p > 0, there is n > O such that pi > 2m i = 1,...,/. Since pa —> p, we 
have pin > pi foreach i = 1, ...,/. Therefore, there is n* such that for all n > n*, 
we have 

Pin > M, be hee Eb 


Since J, — 1, we can also ensure, by taking n* large enough, that /, < 2/ for 
n > n*. Now, at any n > n*, n is a lower bound for the price of commodity i, since 
Pin > n. Moreover, 2/ is an upper bound on the income at n. It follows that for 
n > n*, we have x, € M, where M is the compact set defined by 


l 
M = x eR, Iný xi s21 . 
i=l 
Therefore, there is a subsequence of {xn}, which we shall continue to denote by {xn} 
for notational simplicity, along which {xn} converges to a limit x. 

Since x, > 0 for all n, we also have x > 0. Moreover, py -Xn > p-x,and since 
Pn Xn < ln foreachn and J, —> I,wealso have p-x < /. Therefore, x € B(p, 1). 
and since B(p, 1) C V, by hypothesis, we have x € V. 

However, x, ¢ V for any n, and V is an open set. Therefore, we also have x ¢ V. 
a contradiction. This establishes Lemma 9.22. C) 


Lemma 9.23 B: © — P(S) is Isc. 


Proof Let (p, /) € © and V be an open set such that V N B(p, 1) # ^. Let x be 
a point in this intersection. Then p- x < /. Since V is open, dx € V forô < 1,6 
close to 1. Let x = 6x. Then, p- x < p-x < I. Suppose there is no neighborhood 
of (p. 1) € © such that B(p', 1’) OV #9 forall (p’, 1’) in the neighborhood. Take 
a sequence e(n) > 0, and pick (Pn, In) € Neqny(p, 1) such that B( pr, In) OV = 4. 
Now (pn.x — In) > (p.ž — I) < 0, so forn sufficiently large, x € B( pn, In) CVS. 
a contradiction. o 
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By Lemmata 9.22 and 9.23, B is a continuous correspondence. The theorem is 
established. o 


9.3.2 The Indirect Utility Function and Demand Correspondence 


The properties of v(-) and x(-) that we shall prove in this subsection are summed up 
in the following theorem: 


Theorem 9.24 The indirect utility function v(-) and the demand correspondence 
x(-) have the following properties on ©: 
1. u(-) is a continuous function on O, and x (-) is a compact-valued usc correspon- 
dence on ©. 
2. u(-) is nondecreasing in I for fixed p, and is nonincreasing in p for fixed I. 
. dfu is concave, then v(-) is quasi-convex in p for fixed I, and is concave in I for 
fixed p; further, x(-) is a convex-valued correspondence. 
4. Ifu is strictly concave, then x(-) is a continuous function. 


Ww 


Proof We have already shown that B is a continuous and compact-valued corre- 
spondence on ©. Since u is continuous on S by hypothesis, and u does not depend 
on (p, I), u is certainly continuous on © x S. Then, by the Maximum Theorem, v(-) 
is a continuous function on ©, and x(-) is a compact-valued usc correspondence on 
©. This establishes Part 1. 

That u(p, /) is nonincreasing in p for fixed / is easily seen: if there are (p, /) and 
(p’, I) in © such that p > p’, then B(p, I) C B(p’). Thus, any consumption level 
that is feasible at (p, 7) is also feasible at (p’, 7), and it follows that we must have 
v(p’, I) > u(p, 1). The other part of Part 2—that v is nondecreasing in / (i.e., that 
more income cannot hurt)—is similarly established. 

Now, suppose u is concave. We will first show using the Maximum Theorem under 
Convexity that v(-) must be concave in / for fixed p. We will use the notation B( J, -) 
to denote that p is fixed at the value p, and only the parameter / is allowed to vary. 
Note that since B is continuous in (p, /), itis also continuous in / for fixed p. 

We claim that B(p, -) has a convex graph in I. To see this, pick any /, /’ in R44, 
andletx € B(p, I) andx' € Bp, I’). PickanyA e (0, 1). Letx(A) = Ax+(1—-A)x’, 
and /(A) = AI + (1 — à)’. We have to show that x(A) is in B(p, [(A)). But this is 
immediate since 


Bax) = p- (Ax) + p- (1 —A)x’) 
=Ap-x+(l—A)p-x’ 
<al+d-—al' 
IQ). 


ii 
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Since u is concave in x and is independent of J, itis certainly jointly concave in 
(x, 1). Thus, v(p, I) = max{u(x) | x € B(p, I)} is concave in J by the Maximum 
Theorem under Convexity. This completes the proof of the first part of Part 3. 

The proof of the second part—the quasi-convexity of vin p, for each fixed value of 
/—is left as an exercise to the reader. Finally, since B(p, 7) is compact and convex 
for each (p, I), it is immediate that x(p, /) is a convex set for each (p, /). This 
completes the proof of Part 3. 

Finally, suppose u is strictly concave. Since B(p, I) is a convex set, x(p, /) is 
single-valued for eacn (p, /), and is thus a continuous function. oO 


Remark Itis left to the reader to check that if p is also allowed to vary, the graph 
of B could fail to be convex in the parameters. In particular, Part 3 of the theorem 
cannot be strengthened. It is also left to the reader to examine if it is true that when 
u is strictly concave, v is strictly concave in / for fixed p, 1e., if Part 4 can be 
strengthened. 


9.4 An Application to Nash Equilibrium 
9.4.1 Normal-Form Games 


Unlike perfectly competitive markets in which no single agent has any market power, 
and monopoly in which one agent has all the market power, game theory attempts 
to study social situations in which there are many agents, each of whom has some 
effec. ` on the overall outcome. In this section, we introduce the notion of normal-form 
games, and describe the notions of mixed strategies and a Nash equilibrium. 

A normai-form game (henceforth, simply game) is specified by: 


1. A finite set of players N. A generic player is indexed by i. 
2. Foreachi € N, a strategy set or action set S;, with typical element s;. 
3. For each í € N, a payoff function or reward function r;: Sy x ++» x S — R. 


We will confine our attention in this section to the case where S; is a finite set for 
each i. Elements of S; are also called the pure strategies of player i. The set of mixed 


Strategies available to player i is simply the set of all probability distributions on S. 
and is denoted Yj, i.e., 


Ly; = 4 0;: S; > [0, 1} ] > oj (si) = 1 
HES 
A mixed strategy o; for player i is interpreted as a strategy under which player i 


will play the pure action s; € S; with probability o;(s;). One of the most important 
reasons we allow players to use mixed strategies is that the resulting strategy set 
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Z; is convex, and convexity of the set of available strategies for each player plays a 
critical role in proving existence of equilibrium. (Note that S; is not convex, since it 
is a finite set.) 

The following shorthand notation will be useful. In statements pertaining to i, let 
the index ~i denote “everyone-but-i.” Let $ = xjen5;, and for any i € N, let 
S-i = Xj; Sj. For any s_; € S_; and $; € Sj, (S;, s—;) will represent the obvious 
vector in S. Similarly, let © = xjey Xj, and let L_; = x;y; 2i. Observe that (a) S 
is a finite set, and (b) E is a convex set. Observe also that E is strictly smaller than 
the set of probability measures on S (why?). 

Given a vector o = (01, ..., on) € E, the pure-action profile 0 = (s},...5,) € S 
occurs with probability o1 (s1) x -+-+ X o,(S,). Thus, player i receives a reward of 
ri(s) with probability a) ($1) x -© X €n (Sn). Assuming that players seek to maximize 
expected reward, we extend the payoff functions to È by defining for each i, and for 
anyo = (aj,...,0,) E È, 


ri(o) = J (ri(s)os (81) >+ on (Sn)) 
seS 
It is left to the reader to verify that r; so defined is a continuous function on È for 
each i. 


Given any o € È, we define ô; € E; to be player i’s best response to o if 
ri(Gj,0-;) > ri(õi, o-i), for alla; € Ej. 


The set of all best responses of player i to ø is denoted B R; (a). Observe that B R; (0) 
only depends on o_; and not on the i-th coordinate of o. Notationally, however, it is 
much easier to write B R; as depending on the entire vector o. 

A Nash equilibrium of the game is a strategy vector o* = (of,...,0,) € E such 
that foreach i € N we have of € BR; (o+). 

The existence of at least one Nash equilibrium in every finite game is proved below. 
A technical digression is required first in the form of the Brouwer/Kakutani Fixed 
Point Theorem. 


9.4.2 The Brouwer/Kakutani Fixed Point Theorem 


Given a set X and a function f mapping X into itself, a point x € X is said to be a 
fixed point of f if it is the case that f(x) = x. 


Example 9.25 Let X = [0, 1], and let f: X > X be defined by f(x) = x?. Then, 
f has two fixed points on X, namely, the points 0 and 1. (m 


The definition of fixed points for functions generalizes in a natural way to a defi- 
nition of fixed points for correspondences ® from X into itself: a point x € X is said 
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to be a fixed point of $: X — P(X) if it is the case that x € M(x), that is, if itis the” 
case that the image of x under ® contains x. Note that when @ is single-valued. this 
is precisely the definition of a fixed point for a function, 

The following result provides sufficient conditions for a correspondence to possess 
a fixed point: 


Theorem 9.26 (Kakutani’s Fixed Point Theorem) Let X C R” be compact and 
convex, If ®: X — P(X) is a usc correspondence that has nonempty, compact, and 
convex values, then ® has a fixed point. 


Proof See Smart (1974). QO 


While the condition that ® be nonempty-valued is of obvious importance, the 
other conditions (that it be usc, compact- and convex-valued) are also critical. More- 
over, neither the compactness nor the convexity of X can be dispensed with. Simple 
examples show that if any of these conditions are not satisfied, a fixed point may not 
exist: 


Example 9.27 Let X = [0, 2], and let ® be defined by 


{2}, xe [0,1 
(x) = { (0,2), x=! 
{Oh xe(l, 2) 


Then, X is compact and convex, and ® is nonempty-valued and usc. However, ® is 
not convex-valued at 1, and evidently © possesses no fixed point: QO 


Example 9.28 Let X = (0, i], and let © be defined by 


h x =0 
{0}, x <0. 


Then, X is compact and convex, and ® is nonempty-valued and, convex-valued. 
However, ® is not usc; it also fails to have a fixed point. 5 


Example 9.29 Let X = R. Define $ by ®(x) = [x + 1, x + 3]. Then, X is convex. 
® is nonempty-valued, convex-valued, and usc (in fact, continuous). However, X is 
not compact, and ® fails to possess a fixed point. OQ 
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Example 9.30 Let X = [0, 1] U [2, 3]. Let be given by 
2}, xef0,1 
oa = [2 ren 
{I}, we 2,3) 


Then, X is compact, ® is nonempty-valued, convex-valued, and usc (in fact, contin- 
uous), but X is not convex, and ® does not have a fixed point. o 


In closing this section, it should be noted that Kakutani’s Theorem is a generalization 
of an earlier fixed point theorem that was proved by L.E.J. Brouwer in 1912 for 
functions: 


Theorem 9.31 (Brouwer’s Fixed Point Theorem) Let X C R” be compact and 
convex, and f: X —> X a continuous function. Then f has a fixed point. 


Proof See Smart (1974). g 


9.4.3 Existence of Nash Equilibrium 
We shall now show that every finite game has at least one Nash equilibrium. 


Theorem 9.32 Every finite game T = {N, (Si, ri)ien } has a Nash equilibrium point. 


Proof Recall that for any o € È, BR;(o) C X; denotes the set of best responses of 
player i too. B Ri evidently defines a correspondence from È to ;. Thus, defining 
BR = Xien B Rj, we obtain a correspondence BR from È into itself. By definition, 
a Nash equilibrium is simply a fixed point of this correspondence. Thus, the theorem 
will be proved if we can show that B R and È satisfy all the conditions of the Kakutani 
fixed point theorem. That £ does so is immediate (why is compactness apparent?). 


Lemma 9.33 © is convex and compact. 


We will now show that B R has all the properties required by the Kakutani Fixed 
Point Theorem. 
Lemma 9.34 BR is nonempty-valued, convex-valued, and usc on È. 


Proof We show that BR; is nonempty-valued, convex-valued, and usc for each j. 
This will obviously establish the result. 
Pick any o € EX andi € N. Then, player i solves: 


max{r; (0j, o-i) | oj € Ei}. 
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This is simply a parametric family of optimization problems in which: 
o The parameter space is given by ©. 
e The action space is given by E;. 
e For aj é S; ando = (0;, 0—;) € È, the reward f(0;, 0) is given by r; (0j, o_;). 
e The correspondence of feasible actions D is given by D(a) = Z; forallo € È. 
Since E; is compact and convex, the feasible action correspondence is continuous, 


compact-valued, and convex-valued. Since r; is continuous on X; x L_,, it follows 
from the Maximum Theorem that 


BR(o) = arg max{r; (ôi, oi) | a; € Ei} 


is a nonempty-valued, compact-valued, usc correspondence from £ into itself. 
It remains to be shown that BR; is convex-valued. Define for any 5; € S; and 
Ci € Xj, 
rii, 0-1) = rii s- |] 96). 
j+i 
Then, for any 6; € E; and o; € Ei, 


rilGi, o-i) = J. rili, oi)oilsi). 
siESi 
‘It follows that r; is linear (so concave) on £;. By the Maximum Theorem under 
Convexity, B R; is also convex-valued, and this completes the proof. a 


The existence of a Nash equilibrium in every finite game now follows from Kaku- 
tani’s fixed point theorem. 0 


Remark The existence of a Nash equilibrium in pure strategies can be shown under 
the following modified set of assumptions: for each i (i) S; is a compact and convex 
set, and (ii) r; is quasi-concave on S; for every fixed s_; € S_;. The arguments 
involve essentially a repetition of those above, except that mixed strategies are not 
used at any point (so the best-response correspondence maps Sı x --- x Sp into 
itself). The details are left to the reader. g) 


9.5 Exercises 


1. Let S= © = R,. Determine, in each of the following cases, whether ¢: O — 
P(S) is use and/or Isc at each 8 € ©: 
(a) $(8) = [0,0], @EO. 
(b) O(@) = (0,6), OE 0. 
(c) $()= (0,0), 8EO. 
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(d) (6) = (0,8) if8 > 0, and @(0) = {0}. 
2. Let ®: RR — P(IR) be defined as 
P(x) = [-ix], ixl], x eR. 
Determine if ® is usc and/or Isc on R. 
3. Let ®: RR; — P(R) be defined as 
0, 1/x} ifx>0 
sus ifx=0. 
Determine if ® is usc and/or Isc on R+. 
4. Let ®: Ry — P(R) be defined by 
1/x ifx > 0 
oo = {iy ifx=0. 
Determine if ® is usc and/or Isc on R+. 
5. Let ®: R + P(R) be defined by 
{0,1} ifx #0 
(0,1) ifx =0. 
Determine if ® is usc and/or Isc on R. 
6. Let $: R + P(R) be defined by 
0,1) ifx 40 
E {0 oa 
Determine if ® is usc and/or Isc on R. 
7. Define the correspondence $: Ry —> P(R+) as 
(x) = [x,00), x e R4. 
Is this correspondence usc on R47 Is it lsc? Does it have a closed graph? 
8. Let / = [0, 1], and let the correspondence ®: 7 > P(/) be defined by 
Ox) = {0,x}, xel. 
Is this correspondence usc on 7? Is it Isc? Does it have a closed graph? 
9. Let X = Y = (0, l], and f: X — Y be defined by 
x ifxe (0,1) 


rol ifx=0. 


10. 


13. 
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Examine if f is an upper-Semicontinuous and/or lower-semicontinuous function 
on X. Does f attain a minimum on X? 
Let X = Y = [0, 1], and f: X — Y be defined by 
x ifx€{0,1) 
fava | ifx = 1. 


Determine if (a) f is an upper-semicontinuous function on X, and (b) if it attains 
a maximum on X. 


. Let X = Y = (0, lJ and f: X — Y be defined by 


l ifx is irrational 
Sa) = ieee 

O ifx is rational. 
Examine if f is an upper-semicontinuous and/or lower-semicontinuous function 
on X. 


. Let f:R4 x R4 —> R be defined by 


f(a,x) =(x-1)-(-a)’, xe RÌ. 
Define the correspondence D: Ry. —> P(R4) by 
Dia)={yER,:y<sa}, aeR,. 
Let 
S*(a) = max{ f(a, x) | x € D(a)}. 


Let D*(a) be the set of maximizers of f(a) on D(a). Do the hypotheses of the 
Maximum Theorem hold for this problem? Verify, through direct calculation, 
whether the conclusions of the Maximum Theorem hold. 


Let S = [0, 2], and © = [0, 1]. Let f: S x © — R be defined by 
0 if@ =0 
x/0 if@ > Oand x € [0, 6) 
2—(x/9) if@ > Oand x € [0, 20] 
0 ifx > 20. 
Let the correspondence D:© —> P(S) be defined by 


[0,1 —20) if @ € [0,1/2) 
(0,2-—20} if @ € (1/2, 1}. 


SA = 


D@) = 


(a) Do f and D meet all the conditions of the Maximum Theorem? If yes, prove 
your claim. If no, list all the conditions you believe are violated and explain 
precisely why you believe each of them is violated. 
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14. 


15. 
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(b) Let D*: © -» P(S) be given by 
D*(t) = {x € DO) | f(x, 0) = fx’, 8) for all x € D()}. 
Is it the case that D* (0) # Ø for each 6 € ©? If so, determine whether D* 
is usc and/or Isc on ©. 


Repeat the last problem with the definition of D changed to 


[0,1 — 20] if @ € [0, 1/2) 


D@) = ; 
[0,20 —-1] if@€[1/2, 1]. 


Let f: S x © — R be a continuous function, and D: © —> P(S) be a contin- 
uous compact-valued correspondence, where $ C R”, © C R’. Suppose the 
correspondence D is monotonic in the sense that if 8, 6’ € ©, then 


8 >0' > D8) D> D6’). 


That is, all actions feasible at a value of @ are also feasible at all larger values of 
8. 


(a) Show that f*() must then be nondecreasing on ©. 
(b) Examine under what additional conditions f will be strictly increasing on 
©. 


(c) Give an example to show that if D is not monotone, then f* need not be 
nondecreasing on ©. 


. Let S = © = (0, 1]. Define D: © — P(S) by D(@) = [0, 6] for all 0 € S. Give 


an example of a function f: S x © — R such that f*(0) = max{ f(x, 6) | x € 
D(@)} fails to be continuous at only the point @ = 1/2, or show that no such 
example is possible. 


. Let $ = © = R4. Describe a continuous function f: S x © —> R anda 


continuous, compact-valued correspondence D:@ —> P(S) that possess the 

following properties (or show that no such example is possible): 

(a) D is an increasing correspondence on ©, that is, @ > 9’ implies D(®) > 
De’). 

(b) There are points 6;, 62,03 in © such that 6; > 6) > 63, but /*(@,) > 
£*(03) > f* (02), where f*(@) = max{ f(x) | x € D(@)}. 


. Provide examples to show that Part 2 of the Maximum Theorem under Convexity 


is not necessarily true if D is not convex-valued, or if f is not necessarily concave. 


. Provide an example to show that Part 3 of the Maximum Theorem under Con- 


vexity is not necessarily true if the graph of D fails to be convex. 
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20. Given an example of a function f: S x © — R and a correspondence D:© — 


21. 


— 


22, 


23. 


24. 


25. 


P(S) such that f is not continuous on S x ©, D is not continuous on ©, but f* is 
a continuous function on O, and D* is an upper-semicontinuous correspondence 
on O, where f* and D* are obtained from f and D as inthe Maximum Theorem. 


Suppose we were minimizing f(x, @) over x € D(@), where f and D meet all 
the conditions of the Maximum Theorem. What can one say about the minimized 
value function 


Je(8) = max{ f(x, 6) | x € D(8)}, 
and the correspondence of minimizers 
D, (0) = arg min{ f(x, 6) | x € D@)}? 


Suppose that in the statement of the Maximum Theorem, the hypothesis on f 
was weakened to require only the upper-semicontinuity of f on S x ©, rather 
than full continuity. Assume that the other conditions of the theorem remain 
unchanged. 


(a) Prove that the maximized value function f* will also now be an upper- 
semicontinuous function on ©. 

(b) Give an example to show that D* may fail to be usc on ©. 

Let {N, (Sj, ri)ien} be a normal-form game in which for each i, S; is a convex 

and compact set, and r; is a continuous concave function on S. Show that this 

game has a Nash equilibrium in pure strategies, i.e., that there is s* € S such 

that 7;(s*) > ri (ŝi, s*,) for any $; € S; and any i € N. 

Show that the result of the previous question holds even if r; is only quasi-concave 

on $; for each fixed s_; € S_;. 


Find all the Nash equilibria of each the following games: 


L2 R2 


Li 0.5 


Ry | (5,0) | (1,1) 


Lı 
Ri 
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L ë k 
(i, 1) 
(2, 2) 


3, 3) 
(G, 1) 


Li 
Ri 


R2 
(0,7) 


Ly 
(8, 8): 


(7, 0) 


Li 


(7,7) 


Ri 
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Supermodularity and Parametric Monotonicity 


This chapter examines the issue of parametric monotonicity. The objective is to 
identify conditions on the primitives of parametric families of optimization problems 
under which the optimal action correspondence D*(-) varies monotonically with the 
parameter 0, that is, under which increases in the parameter @ result in increases in 
the optimal actions. 

Implicit in the very question of parametric monotonicity is the idea of an order 
structure on the parameter space © and the action space S.In subsection 10.1.1 below, 
we identify the structure we will require these spaces to have. This is followed in 
subsection 10.1.2 by the definition of the two key conditions of supermodularity and 
increasing differences that will be placed on the objective function. 

Section 10.2 then presents the main result of this chapter, a set of sufficient condi- 
tions on the primitives of a parametric family of optimization problems under which 
parametric monotonicity obtains. A rough summary of this result is as follows. Sup- 
pose $ c R” and © C R!. Let D? f(x, @) represent the (n +/) x (n + /) matrix of 
second-partials of the objective function /: 


af ae 
on acer TET 
D? f&,0) = ; 
af aye 
or —~(x 6 
Jaan P agp ng) 


Then, under some technical regularity conditions, optimal actions increase in the 
parameter vector 0 whenever all the off-diagonal terms in the matrix D? f(x, 0) are 
nonnegative. The remarkable part of the result is that it involves no restrictions on 
the diagonal terms of this matrix; in particular, there are no convexity assumptions 
required. 
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Finally, Section 10.3 provides an example illustrating the value of the results of 
this chapter. 


10.1 Lattices and Supermodularity 


Recall our basic notation that, given any two vectors x = (x1,...,Xm) and y = 
(Yis. -< Ym) in R”, we have 

x=y, ifxj=y, i=l,...,m 

x > y, ifx È yi i=1,...,m 

x> y, ifx> yand xy 

x® y, ifxļ; > yp i=1,...,m. 


Except in the case m = 1, the ordering given by “>” on R” is incomplete. Nonethe- 
less, this ordering suffices to define the restrictions that we will need to obtain para- 
metric monotonicity. We do this in two stages. In subsection 10.1.1, we discuss the 
conditions we will require the action space S to satisfy. Then, in subsection 10.1.2, 
we describe the assumptions that will be placed on the objective function f. 


10.1.1 Lattices 
Given two points x and yin R”, we define the meet of x and y, denoted x A y, to be 
the coordinate-wise minimum of x and y: 
Xy = (min{x}, yi}, sey min{Xm, Ymd). 
In corresponding fashion, the join of x and y, denoted x v y, is defined to be the 
coordinate-wise maximum of the points x and y: 
xv y = (max{x1, yi},...,max{xm, yYm}). 


It is always true that x V y > x; equality occurs if and only if x > y. Similarly, it is 
always true that x A y < x, with equality if and only if x < y. In particular, it is the 
case that 


xZy>rxvyp)>x 
xhy > (xay) <x. 


These elementary observations will come in handy in the proof of parametric mono- 
tonicity in Section 10.2. 

A set X C R” is said to be a sublattice of R" if the meet and join of any two 
points in X is also in X. That is, X is a sublattice of R” if 


x,pEeXx => {(xXAyleXandavye X}. 
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For instance, any interval in Ris a sublattice of R, while the unit square 
I = (x, yeR [0<x<10<y<J 
and the hyperplane 
H = (x,y) eR |x= y} 
are sublattices of R?. On the other hand, the hyperplane 
H' = [y eR |x+y= l} 
is not a sublattice of R?: we have (1,0) € H’ and (0, 1) € H’, but the meet (0, 0) 
and the join (1, 1) of these two points are not contained in 1’. 
A sublattice X C R” is said to be a compact sublattice in R™ if X is also a compact 
set in the Euclidean metric. Thus, any compact interval is a compact sublattice in R, 
while the unit square 7? is a compact sublattice in R?. On the other hand, the open 


interval (0, 1) is a noncompact sublattice of IR, while the set RẸ is a noncompact 
sublattice of R”. 


A point x* € X is said to be a greatest element of a sublattice X if it is the case 


that x* > x for all x € X. A point x € X is said to be a least element of a sublattice 
X if it is the case that x < x forall x € X. 


It is an elementary matter to construct sublattices that admit no greatest and/or 
least element. For example, consider the open unit square /° in R?: 


P = {(x,y)ER?|0<x<1,0<ye<l). 


The following result offers sufficient conditions for a sublattice in R” to admit 


a greatest element; it plays an important role in the proof of our main result in 
Section 10.2: 


Theorem 10.1 Suppose X is a nonempty, compact sublattice of R™. Then, X has a 
greatest element and a least element. 


Proof See Birkhoff (1967). o 


10.1.2 Supermodularity and Increasing Differences 


Let S and © be subsets of R” and R’, respectively. We will assume, through the rest 
of this chapter, that S and © are both sublattices. 

A function f: S x © > Ris said to be supermodular in (x, @) if it is the case that 
for all z = (x, 0) and z’ = (x’, 6’) in S x ©, we have 


SDAS < flevz’)t+ Azaz). 
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If the inequality becomes strict whenever z and z’ are not comparable under the 
ordering >, then f is said to be strictly supermodular in (x, 6).! 


Example 10.2 Let S$ = O = R4, and let f: S x © — R be given by 
S(x,9) = xð. 


Pick any (x, 0) and (x’, 6’) in S x ©, and assume without loss that x > x’. Suppose 
first that 0 > 6’. Then, (x, 6) v (x’, 0’) = (x, @) and (x, @) A (x, 6’) = (x', 6’). 
Therefore, it is trivially the case that 


S&O) + FRO) < fl, 9) VO) + SUC, 8) A (x, 6). 


Now suppose, alternatively, that © < 6’. Then, (x, 0) v (x’, 6’) = (x, 6’) and 
(x, 0) A (x’, 0’) = (x’, 0). Therefore, 


FOV (x/,8')) + f(x, ODA (x',6’)) = x0’ + xo. 


Now, (x6’ + x’@) — (x0 +.x’0’) = x(0’ — 0) — x'(6' — 0) = (x — x')(6’ — 0) > 0, 
so it is true in this case also that 


f(x, 9) + SOO) Ss S(O) O) + f(x, 0) A’, 8). 


Therefore, f(x, 0) = x8 is supermodular on S$ x ©. In fact, since strict inequality 
holds whenever x < x’ and@ > 6’ (orx > x’ and@ > 6’), f is strictly supermodular 
onS x 0. o 


Supermodularity of f in (x, 0) is the key notion that goes into obtaining parametric 
monotonicity. It turns out, however, that two implications of supermodulanity, which 
are summarized in Theorem 10.3 below, are all we really need. A definition is required 
before we can state the exact result. 

A function f: S x © — Ris said to satisfy increasing differences in (x, 0) if for 
all pairs (x, @) and (x’, 6’) in S x ©, it is the case that x > x’ and @ > 9’ implies 


L(x, 8) = S00) = f.O) — fr’, 8). 


If the inequality becomes strict whenever x > x’ and 0 > 6’, then f is said to satisfy 
strictly increasing differences in (x, 0). 
In words, f has increasing differences in (x, 0) if the difference 


f(x, 8) — f(x’, 6) 


between the values of f evaluated at the larger action x and the lesser action x’ is 
itself an increasing function of the parameter @. 


‘If z and z’ are comparable under > (i.c., if either z > z’ or z’ > z), then a simple computation shows that 
both sides of the defining inequality are equal to f(z) + f(z’), so a strict inequality is impossible. 
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Theorem 10.3 Suppose f: S x O > Ris supermodular in (x, 0). Then: 


1. f is supermodular in x for each fixed 9, i.e., for any fixed @ € ©, and for any x 
and x' in S, we have f(x,0} + f(x.) < f(xy x, 8) + fx Ax, 0). 
2. f satisfies increasing differences in (x, 0). 


Proof Part 1 is trivial. To see Part 2, pick any z = {x, 0) and z’ = (x’, 6’) that 
satisfy x > x’ and 0 > 6’. Let w = (x, 0’) and w’ = (x’, 0). Then, w V w’ = z and 
w Aw’ = z’. Since f is supermodular on S x O, 


S+ Sw s fz) + f(’). 
Rearranging, and using the definitions of w and w’, this is the same as 
S8) — £0) = f(x',0)— fao’), 

so f satisfies increasing differences also, as claimed. | 

A partial converse to Part 2 of Theorem 10.3 is valid. This converse is somewhat 
peripheral to our purposes here; consequently, in the interests of expositional continu- 
ity, we postpone its statement and proof to Section 10.4 below (see Theorem 10.12). 
Perhaps the most significant aspect of Theorem 10.12 is that it makes for an easy 


proof of the following very useful result concerning when a twice-differentiable 
function is supermodular: 


Theorem 10.4 Let Z be an open sublattice of R". A C? function h:Z > R is 
supermodular on Z if and only if for all z € Z, we have 


ah 
—— (z 
92,02; 


Proof See Section 10.4. 03 


Theorem 10.4 is of particular interest from a computational standpoint: it becomes 
easy to check for supermodularity of a C? function. The following example illustrates 
this point, using a generalization of the function of Example 10.2: 


Example 10.5 Let S c RZ and © C R4. Let f: S x © — R be given by 
S(x,y), 8) = xy. 


By a process a little more painful than that employed in Example 10.2, it can be 
shown that f meets the inequality required to be supermodular in (x, y, 6). Fora 
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simpler procedure to verify this, note that 


af af a f 

.y,9) = 6, — (x, y,9) = y, and x, y,0) = x. 
axay Č y,9) 5x00 ¢ y9) = y ay06 y8) 
Since x, y, and @ are all nonnegative, these cross-partials are all nonnegative, so f 
is supermodular in (x, y, 9) by Theorem 10.4. D 


10.2 Parametric Monotonicity 
To get an intuitive feel for the role played by the condition of increasing differences 
in obtaining parametric monotonicity, consider the following problem in the special 
case where S C R (i.e., where x is a scalar): 


Maximize f(x, 9) subject to x € S. 


Suppose that a solution exists for all @ € © (for instance, suppose that /(-, 0) is 
continuous on S for each fixed 6, and that S is compact). Pick any two points 6) and 
92 in ©, and suppose that 6; > 02. Let x; and x2 be points that are optimal at 0; and 
02, respectively. Since x2 need not be optimal at 0; nor x; at 62, we have 


F(x, 91) mE S (x2, 61) = 0 
> f(xı, 82) — f(x2, 62). 


Suppose f satisfies strictly increasing differences. Suppose further that although 
6; > 62, parametric monotonicity fails, and we have xı 7 x2. Since x is a scalar 
variable, we must then have xı < x2. Therefore, the vectors (x2, 01) and (x1, 02) 
satisfy x2 > xı and 0, > 92. By strictly increasing differences, this implies 


F(x2, 95) — f(x, > S(x2, 02) — f%), 62), 


which is in direct contradiction to the weak inequalities we obtained above. Thus, 
we must have x; > x2, and since x; and x2 were arbitrary selections from the sets of 
optimal actions at 6 and 62, respectively, we have shown that if S is a subset of R, the 
condition of strictly increasing differences suffices by itself to obtain monotonicity 
of optimal actions in the parameter. 


Theorem 10.6 Suppose that the optimization problem 
Maximize f(x, 6) subjecttox eS 


has at least one solution for each @ € ©. Suppose also that f satisfies strictly 
increasing differences in (x, 0). Finally, suppose that S C R. Then optimal actions 
are monotone increasing in the parameter 6. 
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The arguments leading to Theorem 10.6 do not extend to the case where S c R"™- 
forn > 2, since now the failure of xı and x2 to satisfy xy > x2 does not imply that 
xı < x2. Thus, additional assumptions will be needed in this case. The appropriate 
condition is, it turns out, that of supermodularity in x. The following result is the 
centerpiece of this chapter: p l 
Theorem 10.7 Let S be a compact sublattice of R”, © be a;Sublattice of R) and 
f:S x © — R be a continuous function on S for each fixed 0. Suppose that f 
satisfies increasing differences in (x, 0), and is supermodular in x for each fixed 0. 
Let the correspondence D* from © to S be defined by 


D*(@) = argmax{ f(x,8) |x € s$}. 


— 


. For each @ € ©, D*(@) is a nonempty compact sublattice 7 R”, and admits a 
greatest element, denoted x* (0). wel thy adele a daretu thevoit Xelo). 

2. x*(01) > x* (02) whenever 81 > 62. wg; X49.) 2X lO) Kennes P, >O, 

3. If f satisfies strictly increasing differences in ie 8), then x; 2 X2 for any 

xı € D(O|) and x2 € D(62), whenever 6; > 62. 


Proof Since f is continuous on S for each fixed @ and S is compact, D*(0) is 
nonempty for each 0. Fix 6 and suppose {xp} is a sequence in D*(6) converging to 
x € S. Then, for any y € S, we have 


I (xp, 9) bad S(y, 9) 


by the optimality of xp. Taking limits as p —> oo, and using the continuity of f(., 0). 
we obtain 


f(x,0) = f(y), 


sox € D*(@). Therefore, D*(0) is closed, and as a closed subset of the compact set 


S, it is also compact, Let x and x’ be distinct elements of D* (0). If x A x’ ¢ D*(9), 
we must have 


SXAx',8) < f(%,0) = f(x',6). 


it je : Sly ny glt Eix val O24 Ga fin’ oe! 
Supermodularity in x then implies TLX AX i (XVE CRATE FOC BF, 


fœ Nx, 8) > f(x, j 3 na 8), 


which contradicts the presumed optimality of x and x’ at 6. A similar argument also 
establishes that xV x’ € D*(0). Thus, D* (8) is a sublattice of R” , and as a nonempty, 


compact sublattice of R”, admits a greatest element x* (8). This pane the proof 
of Part 1. ete 


4 
EAT á Amal adimg A {t Xalk) j 
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Now, let 6; and 82 be given with 6; > 02. Let xı € D*(@)) and x2 € D*(62). 
Then, we have 


0 < f(x1,91) — f(r V x2, 91) (by optimality of xı at 81) 
< fœ A x2, 61) — f (x2, 1) (by supermodularity in x) 
< f(x, A x2, 62) — f(x2, 02) < (by increasing differences in (x, 0)) 


<0 (by optimality of x2 at 62), 


so equality holds at every point in this string. 

Now, suppose x; = x*(@;) and x2 = x*(62). Since equality holds at all points ` l 
in the string, it is the case that xj V x2 is also an optimal action at 0}. If it were a 
not true that x; > x2, then we would have x; V x2 > xj, and this contradicts the i 4 
definition of x; as the greatest element of D*(6;). Thus, we must have x; > x2, and 
this establishes Part 2 of the theorem. 

To see Part 3, suppose that xı and x2 are arbitrary selections from D* (6) and 
D* (62), respectively. Suppose we did not have x; > x2. Then, we must have xı Vx? > 
xı and x; Ax2 < x2. If f satisfies strictly increasing differences, then, since 6; > 62, 
we have 


S(x2,1) — f(r Ax2,01) > f(%2,02) ~ fŒ A x2, 02), 


so the third inequality in the string becomes strict, contradicting the equality. g 


Theorem 10.7 effectively assumes a constant feasible action correspondence D 
with D(8) = S for all @. The only significant role this assumption plays in the proof 
is in ensuring the feasibility of the action x; V x2 at 0), and the feasibility of the 
action x; A x2 at 62. To wit, the proof used the fact that optimality of x; at 6, implied 
the inequality 


S(x1,01) — fœ Vx2,1) = 0, 
and that the optimality of x2 at 82 implied the inequality 
fxi A X2, 02) — f(x2, 82) = 0. 


If the feasible correspondence PD is allowed to be nonconstant, and if xı V x2 ¢ D(@) 
(say), then we could not conclude from the optimality of x; at 9; that the first of these 
inequalities holds, since xı only maximizes f over D(6,). On the other hand, if we 
directly assume a condition that implies the feasibility of xı V x2 at D(6,) and xı Ax2 
at D(62), then we can obtain a generalization of Theorem 10.7 to the case where D 
is nonconstant: 
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Corollary 10.8 Suppose S, Q; and f meet the conditions of Theorem 10.7. S uppose 
also thatD: © — P(S) satisfies the following conditions: 


1. Forall@ € ©, D(@) is a compact sublattice of ©. 
2. For all @, and 82 in ©, and for all x; € D(@,) and xz € D(62), 


0; > @2 implies xı V xı € D(O,) and x; A x2 € D(O2). 


Then, if D* (0) = arg max{ f(x, 6) | x € D(@)}, both conclusions of Theorem 10.7 
hold. 


The conditions on D given in Corollary 10.8 are less forbidding than they might 
appear at first sight. For instance, if S = © = R4, the correspondence 
w thy ul bd f iis 
Sayegh Amat Xela) 40 "a 


D@) = [0,0] 


satisfies them. More generally, if S = O = R! the correspondence 


dY lh) 
e PZ A D0) = {xe S|0<x <6} 


also meets the conditions of Corollary 10.8. 

On the other hand, these conditions are also sufficiently restrictive that many in- 
teresting applications are precluded. Consider, for instance, the utility maximization 
problem 


Maximize u(x) subject to x € B(p,/) = {ze R} I p-z< J}. 


Letting x(p, /) denote a solution to this problem at the parameters (p, 7), a question 
of some interest is whether, or under what conditions, “‘own-price” effects on optimal 
demand are negative, that is, whether x;(p, /) is nonincreasing in the parameter p;. 

It is easy to put this question into the framework of Corollary 10.8 by wrung 
— pi, rather than p;, as the parameter of interest. Then, a nonincreasing own-price 
effect is the same thing as the optimal demand for commodity i being nondecreasing 
in the parameter — p;, and this is exactly the manner in which the conclusions of 
Corollary 10.8 are stated. 

Unfortunately, Corollary 10.8 cannot help us obtain an answer. Suppressing the 
dependence on the remaining parameters, the correspondence of feasible actions 
B(— p;) is given by 


B(—pi) = {x ERY | pixi + ye Pix; $ ‘| 
Hi 
It is easy to check that this correspondence does not meet the conditions of Corol- 
lary 10.8; indeed, B(-) is not even a sublattice of R”. 


a 
a 


a) 
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10.3 An Application to Supermodular Games 


This section offers an application of the theory of the previous section to a class of 
n-person games known as supermodular games, For further examples along these 
lines, the reader is referred to Vives (1990), Milgrom and Roberts (1990), Fudenberg 
and Tirole (1990), or Topkis (1995). 


10.3.1 Supermodular Games 


Recall from Section 9.4 that an n-person normal-form game is specified by a strategy 
set S; and a payoff function r;: Sı x §, — R for each player i, i = 1,...,n. As 
earlier, we will write S for x?_,5;, and S_; to denote the strategy sets of “everyone- 
but-i:” S-i = x;;S;. The notation (S;, s_;) will be used to denote the obvious 
vector in S. 

An n-person game is said to be a supermodular game if for each i, S; is a sublattice 
of some Euclidean space, and r;: S —> R is supermodular in s; for each fixed s_;, and 
satisfies increasing differences in (s;, s_;). The game is said to be strictly supermod- 
ular if r; has strictly increasing differences in (s;, s_;) and is strictly supermodular 
in s; for each fixed s_;. i 


Example 10.9 A typical example of a supermodular game is the Bertrand oligopoly 
model with linear demand curves. In this example, players i = 1,..., n, are firms in 
an oligopolistic market. The firms’ products are substitutes, but not perfect substitutes 
in the eyes of consumers. Thus, each firm has some degree of market power. It is 
assumed that firms compete by setting prices so as to maximize (own) profit. Suppose 
the demand curves are given by the following information: if the vector of prices 
chosen by the n firms is p = (p1,..-, Pn), the demand that arises for firm i’s 
product is 
qi(p) = aj — Pipi +} YijPj, 
JAI 

where B and (7; ;)ji are strictly positive parameters.* Suppose further that it costs 
firm i a total of c;g; to produce q; units, where c; > 0. Then, firm i's payoff from 
setting the price p;, given the choices p_; of the other firms, is 


ri(Pi, Pi) = (pi — ci)qi (Pi, P-i). 


A simple calculation shows that 7; satisfies increasing differences in (p;, p-j). 


“Since p; is unidimensional, r; also satisfies supermodularity in p; for each p_i. 


Finally, the strategy space S$; for firm i is simply R+, the set of all possible prices 


2The positive coefficient on pj for j # i captures the fact that an increase in the price of product j causes 
some consumers to move to the substitute offered by i; the positive coefficient on p; reflects the demand 
that will be lost (or that will move to other firms) if i raises prices. 
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~ oe * 


firm i may charge. Since this is evidently a sublattice of R, the game we have de- 
scribed here is a supermodular game. Q 


10.3.2 The Tarski Fixed Point Theorem 
As is almost always the case, the existence of Nash equilibrium in the class of 
supermodular games will also be proved through the use of a fixed point theorem. 


The relevant theorem in this case is the fixed point theorem due to Tarski, which 
deals with fixed points of monotone functions on lattices. 


Theorem 10.10 (Tarski’s Fixed Point Theorem) Let X be a nonempty compact 
sublattice of R”. Let f: X — X be a nondecreasing function, i.e., x, y € X with 
x > y implies f(x) > f(y). Then, f has a fixed point on X. 


Proof See Tarski (1955). a 


There are at least three features of the Tarski fixed point theorem that are worth 
stressing: 


1. Unlike the great majority of fixed point theorems, the Tarski Fixed Point Theorem 
does not require convexity of the set X. 

2. Equally unusually, the thec:em does not require the map f to be continuous, but 
merely to be nondecreasing. 


3. The theorem is valid for nondecreasing functions f, but (somewhat unintuitively) p-e 


is false for nonincreasing functions f. Indeed, it is an elementary task to construct į 
examples of nonincreasing functions from the compact sublattice [0,1] into itself 
that do not have a fixed point; we leave this as an exercise. 


10.3.3 Existence of Nash Equilibrium 


The following result is an easy consequence of Theorem 10.7 on parametric mono- 
tonicity and Tarski’s Fixed Point Theorem. 


Theorem 10.11 Suppose an n-player supermodular game has the property that for 


eachi € {1,...,n}, Sj is compact and r; is continuous on S; for each fixed si. 
Then, the game has a Nash equilibrium. 


Proof Givens € S, player solves? 


Maximize f(5;, s_;) subject to 5; € S;. 


3As in Section 9.4, player i's best-response problem depends on s only through s_;. However, it is nota- 
tionally convenient to write it as a fucntion of s, rather than just s_;. 


Tjere w 
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Since f is supermodular in s; and has increasing differences in (s;, s_;), and since 
S: is a compact sublattice of some Euclidean space, it follows from Part 2 of The- 
orem 10.7 that the set of maximizers in this problem (i.e., player i’s best-response 
correspondence B R; (-)) admits a greatest element b? (s) ateachs € S; and that b? (s) 
is nondecreasing in s. Therefore, so is the map b*: S —> S, where 


b*(s) = (bF(s),..., b3(s)). 


Since each S; is a compact sublattice, so is S. By Tarski’s Fixed Point Theorem, b* 
has a fixed point. Any fixed point of 5* is clearly a Nash equilibrium of the game. 
a 


10.4 A Proof of the Second-Derivative Characterization of Supermodularity 


Our proof of Theorem 10.4 will rely on a characterization of supermodularity in 
terms of increasing differences that was proved in Topkis (1978). Some new notation 
and definitions are required. ü 

Let Z C R”. Forz € Z, we will denote by (z_j;, z;, zi) the vector z, but with 
zi and zj replaced by z; and Zi» respectively. A function f: Z — R will be said 
to satisfy increasing differences on Z if for all z € Z, for all distinct i and j in 
{1,...,2}, and for all z;, zi such that z; > z; and zi > zj, it is the case that 


1 
SC-ij Zip Z) — S (E-ij Zp zj) 2 Lif Zizi) — fZ-ijs Zi, Zj). 


In words, f has increasing differences on Z if it has increasing differences in each 
pair (z;, zj) when all other coordinates are held fixed at some value. 


Theorem 10.12 A function f: Z C R” — R is supermodular on Z if and only if 
f has increasing differences on Z. 


Proof That supermodularity implies increasing differences on Z can be established 
by a slight modification of the proof of Part 2 of Theorem 10.3. The details are left 
to the reader. 

To sce the reverse implication (that increasing differences on Z implies supermod- 
ularity on Z), pick any z and z’ in Z. We are required to show that 


S(2)+ fz’) < feyz) + faz’). 


If z > z’ orz < z’, this inequality trivially holds (in fact, we have an equality), so 
suppose z and z’ are not comparable under >. For notational convenience, arrange 
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the coordinates of z and z’ so that ` 


1 f $ 
ZV z= AZ ys coop Zeki s 2m). 
A 
ZAZ S eieko Zepa i nN 
(This rearrangement of the coordinates may obviously be accomplished without loss 
of generality.) Note that since z and z’ are not comparable under >, we must have 
O<k<m. 
Now for 0 < i < j < m, define 


RF a , z . $ 
zJ = (Zire Zi Zitte -eo Zj jype Zm) 


Then, we have 


gk = ZAZ 
zkwm =ZzVz 
om Zi 
z% = z’. 


Since f has increasing differences on Z, it is the case that for all O <i <k < 
j<m, z! 


FEHI — fI salty — sah), | 


Therefore, we have fork < j <m, 


`~ 
P dia 


rate Llin 
k-1 

PE = a S EE 
i=0 


os kal PIN n 
Se ese 
i=0 


= f- fe). 


Since this inequality holds for all j satisfying A < j < m, it follows that the left-hand 
side is at its highest value at j = m — 1, while the right-hand side is at its lowest 
value when j = k. Therefore, 


Jí (zk) — S") > fi BY = f(z"), 
This is precisely the statement that 
S(zvz')— f(z) = f2- faz). 


Since z and z’ were chosen arbitrarily, we have shown that f is supermodular on Z. 
Theorem 10.12 is proved. a 
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We proceed with the proof of Theorem 10.4. Let f be a C? function on Z c R”. 
By Theorem 10.12, f is supermodular on Z if and only if forall z € Z, for all distinct 
i and j, and for all € > 0 and ô > 0, we have 


SC-ij zite, zj +8) — f(2-ij, zi +e, zj) 2 f(2-ij Zi, zj +8) ~ f(z-ij, Zi, Zj). 


Dividing both sides by the positive quantity ô and letting 5 | 0, we see that f is 
supermodular on Z if and only if for all z € Z, for all distinct i and j, and for all 
€ >0, 

a a 

Ley Zi +€,Zzj) Z L ejz z). 
Subtracting the right-hand side from the left-hand side, dividing by the positive 
quantity €, and letting € | 0, f is seen to be supermodular on Z if and only if for all 
z € Z, and for all distinct i and j, we have 


a f 


ðziðzj 


(z) > 0. 


Theorem 10.4 is proved. 0 


10.5 Exercises 
1. Show that the hyperplane H = {(x, y) | x — y = 1} isa sublattice of R?. 
2. Give an example of a nonempty compact sublattice in R? which has at least two 
greatest elements, or show that no such example is possible. 


3. Suppose a closed set X in R? has the property that the meet of any two points 
from X is also in X. Suppose also that X is bounded above, i.e., there isa € R? 
such that x < a for all x € X. Is it true that X must have a greatest element? 
Why or why not? 

4. Give an example of a set X in R? with the property that the join of any two points 
in X is always in X, but there are at least two points x and y in X whose meet is 
notin X. 

5. Let C(S) be the set of all continuous functions on S = [0, 1]. For f, g € C(S), 
let f v gand f A g be defined by 


(f Vv g){x) = max{ f(x), g(x)} and (f A g)(x) = min{ f(x), g(x)}. 
Show that f v g €e C(S) and f A g € C(S), so C(S) is a lattice. 


6. Suppose in Part 3 of Theorem 10.7, we replaced the condition that f satisfies 
strictly increasing differences with the condition that f is strictly supermodular. 
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Will the result still hold that-D* is a nondecreasing correspondence? Why or” 
why not? 

. Let x € S= [0, I] and 6 € O = R4. In each of the following cases, determine 
if the given function is supermodular in (x, 8). In each case, determine also 
if the optimal action correspondence in the problem max{ f(x, 9) | x € S} is 
nondecreasing in 6. 


(a) f(x, 8). = x0 =378% 2 


(b) f(x,0) = x8- x. / 
(c) f(x, 9) = x/(14+8). # 
(d) f(x,6) = x(1 +8). 


(e) f(x,0) = x0- x) y 


8. Prove Tarski’s Fixed Point Theorem for the case where S c R. 


. Show that Tarski’s Fixed Point Theorem is false if “nondecreasing” is replaced 
by “nonincreasing.” That is, give an example of a nonincreasing function from 
a compact lattice into itself that fails to have a fixed point. 


11 


Finite-Horizon Dynamic Programming 


11.1 Dynamic Programming Problems 


A dynamic programming problem is an optimization problem in which decisions have 
to be taken sequentially over several time periods. To make the problem non-trivial, it 
is usually assumed that periods are “linked” in some fashion, viz., that actions taken 
in any particular period affect the decision environment (and thereby, the reward 
possibilities) in all future periods. In practice, this is typically achieved by positing 
the presence of a “state” variable, representing the environment, which restricts the 
set of actions available to the decision-maker at any point in time, but which also 
moves through time in response to the decision-maker’s actions. These twin features 
of the state variable provide the problem with “bite”: actions that look attractive 
from the standpoint of immediate reward (for instance, a Carribean vacation) might 
have the effect of forcing the state variable (the consumer’s wealth or savings) into 
values from which the continuation possibilities (future consumption levels) are not 
as pleasant. The modelling and study of this trade-off between current payoffs and 
future rewards is the focus of the theory of dynamic programming. 

In this book, we focus on two classes of dynamic programming problems—Finite- 
Horizon Markovian Dynamic Programming Problems, which are the subject of this 
chapter, and Enfinite-Horizon Stationary Discounted Dynamic Programming Prob- 
lems, which we examine in the next chapter. 


11.2 Finite-Horizon Dynamic Programming 
A Finite Horizon (Markovian) Dynamic Programming Problem (henceforth FHDP) 
is defined by a tuple {S, A, T, (rt, fis b)i where 


1. S is the state space of the problem, with generic element s. 
2. A is the action space of the problem, with generic element a. 
3. T,a positive integer, is the horizon of the problem. 


268 
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4. Foreachté {1,...,7}, > ~ » 2 


(a) r: S x A — R is the period-1 reward function, 
(b) fi: S x A —> Sis the period-t transition function, and 
(c) &,: S — P(A) is the period-t feasible action correspondence. 


The FHDP has a simple interpretation. The decision-maker begins from some 
fixed initial state sı = s € S. The set of actions available to the decision-maker 
at this state is given by the correspondence ©; (s1) C A. When the decision-maker 
chooses an action a, € (s), two things happen. First, the decision-maker receives 
an immediate reward of rı (s1, a1). Second, the state sz at the begining of period 2 is 
realized as s2 = f)(s), ay). At this new state, the set of feasible actions is given by 
2(s2) C A, and when the decision-maker chooses an action az € &2(s2), a reward 
of r2(s2, a2) is received, and the period-3 state s3 is realized as s3 = f2(s2, a2). The 
problem proceeds in this way till the terminal date T is reached. The objective is to 
choose a plan for taking actions at each point in time in order to maximize the sum 
of the per-period rewards over the horizon of the model, i.e., to solve 


T 
Maximize D ry(S;, Gy) 


ses 
Ss = fr-v(-1,4-1), 6 =2,...,T 
a, € Os), t=1,...,T. 


subjectto sı 


A cleaner way to represent this objective—and, as we explain, an analytically more 
advantageous one—is to employ the notion of a “strategy.” We turn to this now, 


11.3 Histories, Strategies, and the Value Function 


A strategy for a dynamic programming problem is just a contingency plan, 1.€., a 
plan that specifies what is to be done at each stage as a function of all that has 
transpired up to that point. In more formal language, a t-history n, is a vector 
{S1,@,.-.,5;—1, @7—1, St} Of the state sp in each period t up to ¢, the action a, 
taken that period, and the period-f state s. Let 7 = S, and forz > 1, let 4, denote 
the set of all possible ¢-histories n. Given a t-history n, we will denote by s {n ] the 
period-t state under the history ny. 

A Strategy o , for the problem, is a sequence (a;} i , where for eacht,o,;: H, > A 
specifies the action o;(n,) € ®;(s:[n:}) to be taken in period ¢ as a function of the 
history n; € H; up to t. The requirement that o;(n,;) be an element of ®,(s,[1,]) 
ensures that the feasibility of actions at all points is built into the definition of a 
strategy. Let £ denote the set of all strategies o for the problem. 
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Each strategy o € E gives rise from each initial state s € S to a unique sequence of 
states and actions {5;(o0, S), a; (0, 5)}, and therefore, to a unique sequence of histories 
m (0, 5), in the obvious recursive manner: we have sı (o, s) = s,andfort =1,..., T, 


m(o,5) = {s1(0, s), a;(0,5),...,5:(0, 5)} 
a(o,s) = olm(o,s)] 


St+l (o, s) = hilsi(a, s), ar(o, s)}. 


Thus, given an initial state sı = s € S, each strategy o gives rise to a unique period-t 
reward r; (a Xs) from s defined as 


ri(o)(s) = r[s:(0, s), ar (a, s)). 


The total reward under o from the initial state s, denoted W (a )(s), is, therefore, 
given by 


T 
Wos) = $ r(a)(s). 
tæl 


Now, define the function V: S > R by 


V(s) = sup W(o)(s). 
oer 


The function V is called the value function of the problem.’ A strategy o* is said to 
be an optimal strategy for the problem if the payoff it generates from any initial state 
is the supremum over possible payoffs from that state, that is, if 


Wi(o*\(s) = V(s), foralls eS. 


In this notation, the objective of the decision-maker in our dynamic programming 
problem can be described as the problem of finding an optimal strategy. Writing the 
problem in this way has (at least) two big advantages over writing it as the problem of 
maximizing rewards over the set of feasible state-action sequences from each initial 
state. 

The first advantage is a simple one. If we succeed in obtaining an optimal strategy, 
this strategy will enable us to calculate the optimal sequence of actions from any 
initial state. In particular, this will enable us to carry out “comparative dynamics” 
exercises, where we can examine the effect of a change in the value of the initial state 
(or some parameter of the problem) on the sequence of optimal actions and states. 

The second advantage is deeper. In the formulation of dynamic programming 
problems we have given here, we have assumed that transitions are deterministic, 

'The value function V in this problem is the exact analog of the maximized value function f* that we 


examined in the context of parametric families of optimization problems. Indeed, the initial state s of the 
FHDP can be viewed precisely as parametrizing the problem. 
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i.e., that given any period-f states; and period-r action a,, the period-(¢ -+ 1) state is 
precisely determined as J; (sz, ar). Ina variety of economic problems, it might be more 
realistic to replace this assumption with one where (s;, @,) determines a probability 
distribution over S, according to which the period-(f + 1) state is realized. When the 
transition is allowed to be stochastic in this manner, it may be simply impossible to 
pick a sequence of actions a priori, since some of the actions in this sequence may 
turn out to be infeasible at the realized states. Moreover, even if infeasibility is not 
an issue, picking an a priori sequence of actions leaves us without any flexibility, 
in that it is not possible to vary continuation behavior based on the actual (i.c., 
realized) states. It is obvious that this is a serious shortcoming: it will rarely be the 
case that a single continuation action is optimal, regardless of which future state is 
realized, On the other hand, since a strategy specifies a contingency plan of action, 
its prescriptions do not suffer from such problems. We will not be dealing with the 
stochastic transitions case in this book, since the technical details involved are quite 
messy. Nonetheless, at a conceptual level, the techniques used to show existence 
are exactly the same as what we develop here for the deterministic case. By dealing 
with strategy spaces rather than sequences of actions, the methods we develop will 
continue to be applicable to the more general case. 


11.4 Markovian Strategies 
Any t-history ny = (s1,@),..., Sr) ina given FHDP 
{S, A, T, (r, ®,, Hia) 
results in another FHDP, namely the (T —r)-period problem given by 


(S, A, T =e ot, OTI 


t=ļ t 


whose initial state is s = sņ, and where, fort = 1,..., T — rt, we have: 
ri(s, a) = l4r(s,a), (s,a)ESxA 
Ors) = Dyr(s), seS 
S(s a) = firsa), (s,abe Sx A. 


We shall call the problem {S, A, 7—1, (rf, O7, SOTZ) the (1-1) -period continuation 
problem, and, for notational simplicity, denote it by (S,4,7— 1, (rn, Pr, Su le: 
Clearly, all r-histories nr that end in sr result in the same continuation (T —r)-period 
problem. Therefore, at any point ¢ in an FHDP, the current state s; encapsulates all 
relevant information regarding continuation possibilities from period t onwards, such 
as the strategies that are feasible in the continuation and the consequent rewards that 
may be obtained. 


7Hence, the adjective “Markovian” to describe these problems. 
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Since continuation possibilities from a state are not affected by how one arrives 
at that state, it appears intuitively plausible that there is no gain to be made by 
conditioning actions on anything more than just the value of the current state and the 
time period in which this state was reached. (This last factor is obviously important; 
to calculate optimal continuations from a given state s requires knowledge of the 
number of time periods remaining, or, equivalently, what the current date is.) This 
leads us to the notion of a Markovian strategy. 

A Markovian strategy is a strategy o in which ateach: = 1,..., T—1, 0, depends 
on the t-history n; only through ż and the value of the period-r state s,[7,;] under 
nr. Such a strategy can evidently be represented simply by a sequence {g1,..., gr}, 
where for each t, gr: S — A specifies the action g;(s;) € &,(s;) to be taken in period 
t, as a function of only the period-t state s;. 

If a Markovian strategy {g),...,g7} is also an optimal strategy, it is called a 
Markovian optimal strategy. This terminology is standard, but is a little ambiguous; 
to avoid legitimate confusion, it should be stressed that a Markovian optimal strategy 
is a strategy that is optimal in the class of all strategies, and not just among Markovian 
Strategies. 


11.5 Existence of an Optimal Strategy 


Leta strategy o = {0),..., or} be given, and suppose {g;,..., gr} is a Markovian 
strategy for the (T—r-+1)-period continuation problem. Then, we will use the notation 


(01, .--, 01-1) Sty +++) ST} 


to represent the strategy in which the decision-maker acts according to the recom- 
mendations of ø for the first (t—1) periods, and then switches to following the dictates 
of the strategy {g;,..., gr}. 


The key to proving the existence of an optimal strategy—indeed, of a Markovian 
optimal strategy—is the following lemma: 
Lemma 11.1 Leto = (oj,...,07) be an optimal strategy for the FHDP 
(S, A, T, (rt. fis Peg). 
Suppose that for some t € {1,..., T}, the (T —t+1)-period continuation problem 
(S, A, T-tth (fr, fi D) 
admits a Markovian optimal strategy {g:, .. ., gr}. Then, the strategy 
loi, - +++ Or-1, Sr» +» BT) 


is an optimal strategy for the original problem. 
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Proof For notational ease, denote the strategy {o1,...,07-1, 8r,- -, gr) by yif 
the claim in the theorem were not true, then there would exist an initial state s of the 
T-period problem, such that the total payoffs under y from the state s were strictly 
dominated by the total payoff from s under the optimal strategy o, i.e., such that 


W(o)(s) > Wys). 


Letting r; (o )(s) and r; (y )(s) denote the period-t rewards under o and y, respectively, 
from the initial state s, we have 


D r-l T 
W(a)(s) = JOro) = $O ros) + So rlo), 
i=l 1=1 =t 
and 
T t-1 T 
Wyss) = JOrn) = Dons) + Yo rly). 
t=\ t=) =r 
By construction, Ea ry(a)(s) = = r:(y)(s), Since o and y are identical over 


the first (r — l )-periods. Therefore, for W (a )(s} > W(y)(s) to hold, we must have 


T T 
Y rloXs) > Yri). 
tæt t=T 


If we now let s* denote the common period-t state under both y and ø, this inequality 
states that from sf, the strategy (gr, ..., gr } is strictly dominated by the continuation 
strategy (0r, ..., or). This is an obvious contradiction of the presumed optimality 
of {gr,..., gr} inthe (T—t+1 )-period problem: for, when beginning from the initial 
state sf in the (T—1+1)-period problem, one could simply behave “as if” the history 
Nz(o, 5) had occurred, and follow the continuation strategy {o,, ..., a7}. This would 
result in a strictly larger reward from sf than from following {gr,... gr}. E) 


The importance of Lemma 11.1 arises from the fact that it enables us to solve for 
an optimal strategy by the method of backwards induction. That is, we first consider 
the one-period problem in which, from any s € S, we solve 


Maximize rr(s,a) subject toa € Pr(s). 


Let g} (s) be a solution to this problem at the state s. By Lemma 11.1, the solution 
gp can be used in the last period of an optimal strategy, without changing the total 
rewards. Now, given that we are going to use g7 in the last period, we can find the 
actions in the two-period problem beginning at period-(7—1), that will be optimal for 
the two-period problem. This gives us an strategy pair (g7-_,, g7) that is optimal for 
the two-period continuation problem beginning at period 7—1. An induction argument 
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now completes the construction of the optimal strategy. The formal details of this 
method are given below in Theorem 1 1.2. But first, since even the one-period problem 
will not have a solution unless minimal continuity and compactness conditions are 
met, we shall impose the following assumptions: 


Al For each t, r, is continuous and bounded on S x A. 
A2 For each ¢, f; is continuous on S x A. 


A3 For each t, ©; is a continuous, compact-valued correspondence on S. 


Theorem 11.2 Under Al-A3, the dynamic programming problem admits a Marko- 
vian optimal strategy. The value function V;(-) of the (T —t+1)-period continuation 
problem satisfies for eacht € {1,...T} ands € S, the following condition, known 
as the “Bellman Equation,” or the “Bellman Principle of Optimality:” 


Vis) = max {r;(s,a) + Visi lL f(s, a))}. 
aed,(s) 


Proof We begin with the last period. A strategy gr for the one-period problem can 
be optimal if, and only if, at each s, gr (s) solves: 


max{rr(s,a) |a € ®r(s)}. 


Since rr is continuous on S x A, and ®7 is a compact-valued continuous corre- 
spondence, the solution to this problem is well defined. The maximized value of the 
objective function is, of course, simply V7 (s). Let $7} (s) denote the set of maximiz- 
ers at s. By the Maximum Theorem, Vr is a continuous function on S, and OF isa 
nonempty-valued usc correspondence from S into A. 

Now, pick any function g}: S —> A such that g7 (s) € $7 (s) for all s € S. (Such 
a function is called a selection from 7.) Then, at any initial state of the one-period 
problem, the function g} recommends an action that is optimal from that state, so 
g7 is an optimal strategy for the one-period problem. Note that $} may admit many 
selections, so the optimal strategy for the one-period problem may not be unique; 
however, all optimal strategies for this problem result in the payoff V7 (s) from the 
initial state s. 

Now pick any optimal strategy g7 for the one-period problem, and consider the 
two-period problem from the initial state s (i.e., when s is the state at the beginning of 
period (7—1)). If the actiona € ®y_)(s) is now taken, the immediate reward received 
is rr_,(s, a). The period-T state is then realized as fr_;(s, a). The maximum one- 
period reward from the state f7—,(s, a) in the continuation one-period problem is, 
by definition, Vr [_fr_1(s, a)], which can be attained using the action recommended 
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by g7 at this state. Thus, given that the action a is taken at state s in the first period’ 
of the two-period model, the maximum reward that can be obtained is 


rp—1(s,a) + Vri fr-i(s,a)). 
It now follows that {g7-_,, g7} is an optimal strategy for the two-period problem if, 
and only if, at each s € S, g}_, solves 


max {rr_y(s,a) + Vr[fr—1(s, a)l}. 
aePr_{(s) 


Since /r— is continuous on S x A by assumption, and we have shown that Vr is 
continuous on S, it must be the case that V7[_f7—1(s, a)] is continuous in (s, a). Since 
rr—, is also continuous on S x A, the objective function in this problem is continuous 
on S x A. By hypothesis, ®7_, is a continuous, compact-valued correspondence. 
Thus, invoking the Maximum Theorem again, we see that the maximum is well- 
defined, i.e., the correspondence of maximizers ®7._, is nonempty valued and usc, 
and the maximized value (which is, of course, just Vr_;) is continuous on S. Let 
Zr; be any selection from PẸ}. By construction, then, (g7_,, g7) is an optimal 
strategy for the two-period problem. Since it is also a Markovian strategy, it is a 
Markovian optimal strategy for the two-period problem. 
_ To show that there is a solution for any t, we use induction. Suppose that for some 
te {i,..., 7-1}, we have shown the following: 


1. A solution (that is, a Markovian optimal strategy (g?, |, ..., 27-}) exists for the 
(T —t)-period problem. 
2. The value function V,+1 of the (T —1)-period problem is continuous on S. 


We will show that these properties are also true for the (T — t + 1)-period problem. 

Suppose the initial state of the (T —¢ + 1)-period problem (i.e., at the beginning 
of period f) is given by s. If the action a € ©,(s) is taken, then an immediate 
reward of r;(s, a) is received; the state at the beginning of period ¢ is then f; (s, a). 
The maximum one can receive in the continuation is, by definition, Vipi [/7(s, @)], 
which can be obtained using (g/,,,..., @7]. Thus, given the action a in period 7, 
the maximum possible reward is 


r(s,a) + Vial fils, a)). 
It follows that gf can be part of an optimal strategy in the (T —1+1)-period problem 


if, and only if, at each s, g? (s) solves: 


Fe (ri(s; a) + Viil (s, a). 


By hypothesis, 7; and f, are both continuous in (s, a), and V,+; is continuous on S. 
Therefore, the objective function in this problem is continuous on S x A, and since 
®, is a continuous compact-valued correspondence, a maximum exists at each s; the 
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maximized value of the objective function (which is V;) is continuous on S; and the 
correspondence of maximizers ©F is usc on S. Letting gř denote any selection from 
7, we see that {g",..., g7} is an optimal strategy for the (T—t+1)-period problem. 
The induction step is complete. 

Finally, note that the Bellman Equation at each ¢ has also been established in the 
course of proving the induction argument. a 


11.6 An Example: The Consumption-Savings Problem 


We illustrate the use of backwards induction in this section in a simple finite-horizon 
dynamic optimization problem. A consumer faces a T-period planning horizon, 
where T is a finite positive integer. He has an initial wealth of w € R+. If he 
begins period-t with a wealth of w,, and consumes c; (wp > c; > 0) that period, 
then his wealth at the beginning of the next period is (w; — c;)(1 + r), where r > 0 
is the interest rate. Consumption of c in any period gives the consumer utility of 
u(c), where u(-) is a continuous, strictly increasing function on R4* The consumer’s 
objective is to maximize utility over the T -period horizon. 

We shall first set up the consumer’s maximization problem as a dynamic program- 
ming problem. Then, assuming u(c) = ./c, we shall obtain an exact solution. 

The natural choice for the state space S is the set of possible wealth levels which 
is R4. Similarly, the natural choice for the action space A is the set of possible 
consumption levels, which is also R4. 

The reward function r; depends only on the consumption c in any period, and 
is given by r;(w,c) = u(c). The transition function f; is given by fi(w,c) = 
(w —c)(1 + r). And, finally, the feasible action correspondence ®, is given by 
,(w) = [0, w} for all w. 

Now suppose u(c) = ./c. For ease of notation, let k = (1 + r). In the last period, 
given a wealth level of w, the consumer solves: 


max u(c). 
ce[O,w] 


It is easy to see that since u(-) is a strictly increasing function the unique solution at 
any w is to consume everything; thus, the unique optimal strategy for the one-period 
problem is gr(w) = w for all w € S. The one-period value function Vr is, then, 


Vr(w) = Jw, wes. 


Now, consider the two-period problem starting from some level w € S. The action 
gr (w) is part of an optimal strategy for the two-period problem if, and only if, 
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&T~1(w) solves ; p be faces 


max ek Vr{k(w — o}. 
(Recall that k = (1 + r).) Thus, gr_;(w) must solve 


{v + Vk@w—o)}. 
vel w] 
This is a strictly convex optimization problem. The first-order conditions are, there- 
fore, necessary and sufficient to identify a solution. A simple calculation reveals that 
there is a unique solution at each w, given by w/(1 + k). So, if gr—1 is to be part of 
an optimal strategy for the two-period problem, we must have 
(w) = — s 
-ı(w) = ——, wes. 
ii 14k 

Substituting this into the maximization problem, we see also that the two-period 
value function Vr_ is given by 


Vr_\(w) = +k) Êw, wes. 


Reworking this procedure for £ = 3,t = 4, etc., is conceptually trivial, if a bit 
painful to carry out. When one actually does the calculations, a pattern begins to 
emerge: we obtain 


Vi (w) = (1 tke tk Py! 


and 
w 


gw) = pp paT: 


To check that these are correct expressions we use induction. Suppose that these are 
the forms for V; and gr fort € {t+ 1,..., T}. We will show that V, and g; also 
have these forms. 

In period z, at the state w, the consumer solves: 


max {Ve + Valko — c)]} 
cel w) 


where V4, [k{w — c)] = (1 +k + kT kw — c)]!/?, by the induction 
hypothesis. This is, once again, a strictly convex optimization problem, so gr ~ (w) 
is uniquely determined by the solution. A simple calculation shows itis, in fact, given 
by 

w 


eae ee es S. 
Ttk + YS 


g&(w) = 
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Substituting this back into the objective function of the problem, we obtain 


Viw) = +k tk Oly, wes. 


The induction step is complete. QD 


L 


11.7 Exercises 


Redo the consumer’s multiperiod utility maximization problem of Section 11.6 
for each of the following specifications of the utility function: 


(a) u(c) =c*%,a € (0, 1). 
(b) u(c) =c. 
(c) u(c) =c, B > 1. 


. Consider the following problem of optimal harvesting of a natural resource. A 


firm (say, a fishery) begins with a given stock y > 0 of a natural resource (fish). 
In each period t = 1,...T7, of a finite horizon, the firm must decide how much 
of the resource to sell on the market that period. If the firm decides to sell x units 
of the resource, it receives a profit that period of 7r (x), where z: R+ — R. The 
amount (y — x) of the resource left unharvested grows to an available amount 
of f(y — x) at the beginning of the next period, where f: Ry — Ry. The firm 
wishes to choose a strategy that will maximize the sum of its profits over the 
model’s T-period horizon. 


(a) Setup the firm’s optimization problem as a finite-horizon dynamic program- 
ming problem, i.e., describe precisely the state space S, the action space A, 
the period-t reward function feasible action 7;, etc. 

(b) Describe sufficient conditions on f and x under which an optimal strategy 
exists in this problem. 

(c) Assuming 7r (x) = logx and f(x) = x” (0 < œ < 1), solve this problem 
for the firm’s optimal strategy using backwards induction. 


. Given w1, w2,..., wr € Ry4 and c € R44 express the following problem as a 


dynamic programming problem, i.e., find S, A,r;, fr, and ®,, and solve it using 
backwards induction. The problem is: 


f 
Maximize 5 sia? 
t=1 


ll 
9 


T 

subject to ya 
t=1 

a= 


Vv 
S 
~ 
ll 
— 
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4. Consider the following maximization problem: 
rs 
Maximize a,‘ 
=l 


T 
subject to Ya =c 
t=! 
a, > 0, ES Pay A 
where ôi, ..., ôr and c are given positive constants. 


(a) Express this problem as a dynamic programming problem and solve it. 
(b) Repeat this exercise with the constraints modified to 


T 
[Ja =c 
t=} 
a, > l, t=1,...,T. 


5. Express the following problem as a dynamic programming problem and solve it. 


Pist 
ta St + ar 


Maximize 


T 
subject to Ya =c 
t=l 
a, > 0, t=1,...,T7 


where pr and s; are parameters such that: 


T 
a=) 


t=] 


P: 20, s 20, t=1,...,T7. 


6. Letanm xn matrix A be given. Consider the problem of finding a path between 
entries a; j inthe matrix A which (i) starts at a}; and ends at amp, (ii) which moves 
only to the right or down, and (iii) which maximizes the sum of the entries a;j 
encountered. Express this as a dynamic programming problem. Using backwards 
induction solve the problem when the matrix A is given by: 


4 9 3 6 3 
5 6 6 4 
cas ae ae T0 
43 5 1 9 
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Describe the following problem in a dynamic programming framework, and 
solve it using backwards induction. 


T 
Maximize I] Cr 


t=! 


T 

subject to Lea =c 
t=1 

c 2 0. 


A firmis faced with achoice of two technologies. The first (the known technology) 
results in the firm incurring a constant unit cost of c per unit of production. The 
second technology (the unknown technology) has associated with it an initial 
unit cost of production of d > c, but the cost of production declines over time as 
this technology is used owing to a “learning by doing” effect. Suppose that if the 
unknown technology has been used for ¢ periods, the per-unit cost of production 
from using this for a (¢ + 1)-th period is given by w(t), where 


w(t) = do + dje™ z 


where do < c. (Note that we must have dg + d; = d, since w(0) = d.) Assume 
that the firm produces exactly one unit of output per period, and is interested in 
minimizing the sum of its costs of production over a T-period horizon. 


(a) Express the optimization problem facing the firm as a dynamic programming 
problem, and write down the Bellman Equation. 

(b) Does this problem have a solution for any T? Why or why not? 

(c) As T increases, show that it becomes more likely that the firm will use the 
unknown technology in the following sense: if there exists an optimal strategy 
in the 7-horizon model in which the firm uses the unknown technology in 
the first period, there exists an optimal strategy in the (T + 1)-horizon model 
in which the same statement is true. 

(d) Fix any T > 1. Prove that if it is optimal for the firm to use the unknown 
technology in the first period, then it is optimal for the firm to stay with the 
unknown technology through the entire T-period horizon, 
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Stationary Discounted Dynamic Programming 


The principal complication that arises in extending the results of the last chapter 
to dynamic programming problems with an infinite horizon is that infinite-horizon 
models lack a “last” period; this makes it impossible to use backwards induction 
techniques to derive an optimal strategy. In this chapter, we show that general condi- 
tions may, nonetheless, be described for the existence of an optimal strategy in such 
problems, although the process of actually deriving an optimal strategy is necessar- 
ily more complicated. A final section then studies the application of these results to 
obtaining and characterizing optimal strategies in the canonical model of dynamic 
economic theory: the one-sector model of economic growth. 


12.1 Description of the Framework 


A (deterministic) stationary discounted dynamic programming problem (henceforth. 
SDP) is specified by a tuple {S, A, ®, f, r, 5}, where 


1. S is the state space, or the set of environments, with generic element s. We 
assume that S C R” for some n. 

2. A is the action space, with typical element a. We assume that 4 C IRA for some 
k. 

3. $: S —> P(A)is the feasible action correspondence that specifies for eachs € S 
the set Ọ (s) C A of actions that are available at s. 

4. f:S x A — Sis the transition function for the state, that specifies for cach 
current state-action pair (s, a) the next-period state f(s, a) € S. 

5. r: Sx A — Ris the (one-period) reward function that specifies a reward r (s.a) 
when the action a is taken at the state s. 

6. ô € [0, L) is the one-period discount factor. 


The interpretation of this framework is similar to that of the finite-horizon model, 
the one important difference being the horizon of the model, which is assumed to be 
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infinite here. An initial state sọ € S is given. If the decision-maker takes an action ao 
in the set of actions ® (so) that are feasible at this state, two things happen. First, the 
decision-maker receives an immediate reward of r(so, ag). Second, the state moves 
to its period-1 value s; = f(so, ao). The situation now repeats itself from sı. The 
decision-maker is presumed to discount future rewards by the factor ô € [0, 1); thus, 
if the action a, is taken in period-t at the state sz, the reward is deemed to be worth 
only 8'r(s;, aç) today. As earlier, the aim of the decision-maker is to maximize the 
sum of the rewards over the model’s (now infinite) horizon, i.e., to solve from any 
given initial state so € S: 


oo 
Maximize Y ô'r (sr, a) 
1=0 : 
subjectto Sit1 = f(s,a;), £=0,1,2,... 
a € Os), t=0,1,2,... 


Once again, the notion of a “strategy” plays an important role in our analysis. We 
turn to a formal definition of this concept. 


12.2 Histories, Strategies, and the Value Function 


A t-history h; = {50, a0, . . - , St-1, 4-1, St} for the SDP is a list of the state and action 
in each period up to ¢ — 1, and the period-¢ state. Let Ho = S, and fort = 1,2,..., 
let H; denote the set of all possible t-histories. Denote by s,[h,] the period-t state in 
the history A,. 

A strategy a for the SDP is, as earlier, a contingency plan, which specifies an action 
for the decision-maker as a function of the history that has transpired. Formally, 
a Strategy o is a sequence of functions {o;}?2, such that for each ¢ = 0,1,..., 
or: Hi — A satisfies the feasibility requirement that o; (h) € @P(s;[h;]}). Let © 
denote the space of all strategies. 

Fix a strategy o. From each given initial state s € S, the strategy o determines 
a unique sequence of states and actions {sr (0, $), a;(o, 5)} as follows: solo, s) = s, 
and fort =0,1,2,..., 


hy(a, s) = {so(o, s), ago, s),..., 5;(0, 5)} 
arlo, 5) = or(hi (0, s)) 
St+1(0, 5) = F(s:(a, s), ar(o, S)). 


Thus, each strategy o induces, from each initial state s, a period-t reward r;(o)(s), 
where 


n(a)(s) = ris: (0, s), a(o, s)). 
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Let W (o )(s} denote the total discounted reward from s under the strategy o: >- 
0° 
Wios) = $ sro Xs). 
t=0 
The value function V : S —> R of the SDP is defined as 


V (s) = sup W(o)(s). 
oeL 


A Strategy o* is said to be an optimal strategy for the SDP if: 
Wi(o*)(s)=Vi(s), ses. 


In words, an optimal strategy is one whose specifications yield a reward that cannot 
be beaten from any initial state. 


12.3 The Bellman Equation 


Simple examples show that several problems arise in the search for an optimal strategy 
unless we impose some restrictions on the structure of the SDP. 


Problem 1 Two or more strategies may yield infinite total rewards from some initial 
states, yet may not appear equivalent from an intuitive standpoint. Consider the 
following example: 


Example 12.1 Let S = A = R4, and let O(s) = [0,5] fors € S. Define the 
transition and reward functions by 
f(s,a) = 3(s—a), (s,ayESx A, 
and 
r(s,a) =a, (s,avESxA. 
Finally, let ô = 3/4. 


Suppose the initial state is sọ = 3. It is easily checked that the sequence of actions 
{ar} given by a, = (4/3)! is feasible from so.! For each z, 


3 t t 
ô'r (s1, ar) = (3) (5) = |, 


so the total reward associated with this sequence is 1 + 1 +1 +... = +œ. On the 
other hand, the sequence of actions {a’} defined by a; = (3/2)(4/3)',t =0,1,2,.... 
is also feasible from sọ = 3, and also yields infinite total reward. However, for each 


'We are being sloppy here in the interests of brevity. What we mean formally is that there is a strategy o in 
this SDP such that from the initial state s, the strategy o results in the series of actions {a; (0, 5)} = {(4/3)! }. 
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a; = (3/2)a;, and one gets the feeling that {a;} is more “reasonable” as a possible 
solution than {az}. a 


Problem 2 No optimal strategy may exist even if Lo ô'r (S1, at) converges for each 
feasible sequence {a;} from each initial state s. Consider the following example: 


Example 12.2 Let S = {0}, A = {1,2,3,...}, ®(@) = A, f(0,a) = O, and 
r(0, a) = (a ~ 1)/a. Let 5 be any numberin (0, 1). It is easily seen that the supremum 
of the objective function over the space of feasible sequences of actions is (1 — 4)~!, 
but there is no feasible sequence {ar}, such that 37°29 4! [lar — 1)/a;] = (1 — 8)7!. 


QO 


The first problem can be overcome if we assume S and A to be compact sets and 
r to be continuous on S x A. In this case, r is bounded on S x A, so for any feasible 
action sequence, total reward is finite since ô < 1. More generally, we may directly 
assume, as we shall, that r is bounded on § x A: 


Assumption 1 There is a real number K such that |r(s, a)| < K for all (s,a) € 
Sx A. 


Assumption 1 is, by itself, sufficient to yield two useful conclusions. 


Lemma 12.3 Under Assumption 1, the value function V is well defined as a function 
from S to R, even if an optimal strategy does not exist. Moreover, given any s € S, 
and any € > Q, it is the case that there exists a strategy o, depending possibly on € 
and s, such that W(a)(s) > V (s) — €. 


Proof Fix any s. Since r is bounded, it is the case that |W (o )(s)| < K/(1 — 4) for 
any o. Thus, V(s) = sup, W(o)(s) must be finite, so V: S —> R is well defined. 
Pick any s and € > 0. If there were no strategy such that W (o )(s) > V(s) — €, then 
we must have W(o)(s) < V(s) — €, and so V (s) = sup, W(o)(s) < V(s) —€, 
which js absurd. oO 


' Overcoming the second problem is, of course, the central issue: when does the 
SDP have a solution? The first step in examining the existence of a solution is the 
Bellman Equation, or the Principle of Optimality. Intuitively, the Bellman Equation 
is a statement of “dynamic consistency” of the optimal path—viz., that once on the 
path there is never a tendency to move away from it. 
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Theorem 12.4 (The Bellman Equation) The value function V satisfies the follow- 
ing equation (the “Bellman Equation”) at each s € S: 


Vis) = sup {r(s,a) + Vtis, ah. 
aeD(s) 


Proof Let W:S — R be defined by 


W(s) = sup (r(s,a)+ dV (f(s, a))}. 
aes) 
We will show that V = W, by first showing V(s) < W(s) for all s, and then 
V (s) > W (s) for all s. Fix any s € S. 

Pick any € > 0. Let a be any action in ®(s) and let s’ = f(s, a). Pick a strategy o 
that satisfies W(o)(s’) > V (s') — €/8. As we have observed, such a strategy always 
exists by definition of V. Now the action a followed by the strategy o is feasible 
from s, since a was chosen from ®(s), and by definition all strategies specify only 
feasible actions. So 


V (s) 


V 


> r(s,a) + dW(o)( f(s, a)) 
> r(s,a) + 8V (f(s,a)) — €. 


But a was picked arbitrarily; therefore, this holds for all a € Ẹ (s). So 


V(s)> sup {r(s,a) + dV (f(s, a)) - e€} 
ac®(s) 


or, since € > 0 was also arbitrary 


V(s) > sup (r(s,a) + 45V (f(s, a))}. 
aéP(s) 


This completes step | of the proof. 
Now fix an s € S. Pick any € > 0. Leto be a strategy that satisfies 


Wlas) > Vi{s)-e. 


Leta = oo(s),and sı = f(s, a). Finally, let the continuation strategy specified by a 
after the first period be denoted by o |1. Since the strategy ø |; is certainly a feasible 
continuation s;, we must have 


V (s1) = W(ols)(s1). 
We now have 
Vis) —€ < W(c)(s) 


r(s,a)+ dW (ols) [ f(s, a)] 
r(s,a) + 6V (f(s, a)). 


IA 
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So 


V(s)—€ < sup {r(s,a) +6V(f(s,a))}. 
aeP(s) 


Since € is arbitrary, we have 
V(s) < sup{r(s,a) +8V (f(s, a))}, 
a 


which completes the second step of the proof. oO 


We shall use the Bellman Equation to identify a set of conditions under which the 
SDP has a solution. To do this, a technical digression is required. 


12.4 A Technical Digression 
12.4.1 Complete Metric Spaces and Cauchy Sequences 


A metric space is a pair (X, d) where X is a vector space,? and d: X x X > Risa 
function satisfying the following conditions for all x, y, z € X: 


1. (Positivity) d(x, y) > 0, with equality if and only if x = y; 
2. (Symmetry) d(x, y) = d (y, x); and 
3. (Triangle Inequality) d(x,z) < d(x, y) + d(y, z). 


A description of metric spaces and their properties may be found in Appendix C. 
We summarize here some of the properties that are important for this chapter. 

The canonical example of a metric space is k-dimensional Euclidean space, R*, 
with the Euclidean metric. As we have seen in Chapter 1, the Euclidean metric meets 
the three conditions required in the definition of a metric. 

A second example, and one of some importance for this chapter, is the space 
C(S) of all continuous and bounded functions from S C R” to R, endowed with the 
“sup-norm metric” d, where d is defined by 


d(f,g) = sup |f(y) — g(y)I, heeCc(S). 
ye(0,1] 


Since the absolute value [ f(y) — g(y){ is nonnegative, and is zero if and only if 
S(y) = g(y), we have d( f, g) = 0, with equality if and only if f(y) = g(y) for all 
y, ie.,if and only if f = g. Thus, the sup-norm metric meets the positivity condition. 
It is evidently symmetric. Finally, pick any f, g, h € C(S). For any y € S, we have 


f(y) — AO) < 10) — 8) + le) — 4) 
< d(f,g)+d(g,h). 


2Vector spaces are defined in Appendix C. 
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Since y € S was arbitrary, ~ ~ -~ 


a( f.h) = sup|f(y) ~ARO < alf, g) + dlg, h), 


and the sup-norm metric d meets the triangle inequality condition also. Thus, d is a 
sueuric on C(S). 

‘The definitions of a convergent sequence and a Cauchy sequence are generalized 
from R” to abstract metric spaces in the obvious way. Given a sequence (x,} ina 
metric space (X, d), we say that {x,} converges to x € X (written x, > x) if itis 
the case that the sequence of real numbers d(x, x) converges to zero, i.e., if for any 
€ > 0, there is n(€) such that for all n > n(e), it is the case that d(x, xX) < €. 

A sequence {xn} in (X, d) is said to be a Cauchy sequence if for all € > O, there 
is n(e) such that for all n,m > n(e), it is the case that d (Xn, Xm) < €. 

A Cauchy sequence {x,} in (X, d) is said to converge if there is x € X such 
d(Xn,x) — Oas n — oo. (That the limit x be an element of X is an important part 
of the definition.) 

As in R” with the Euclidean metric, every convergent sequence in an abstract met- 
ric space is also a Cauchy sequence. Unlike R”, however, the converse is not always 
true. There exist metric spaces in which not all Cauchy sequences are convergent 
sequences. A simple example is the following: let X be the open interval (0,1), and 
let d be the Euclidean metric on R restricted to (0,1). Then, the sequence {x,} in 7 
defined by x, = 1 forall n is a Cauchy sequence in X that does not converge: there 
is no x € X such that 1 >x. 

When a metric space has the property that all Cauchy sequences in the space 
converge, it is said to be a complete metric space. 


12.4.2 Contraction Mappings 
Let (X, d) be a metric space and T: X —> X. For notational convenience in what 


follows, we shall write T x rather than T (x) to denote the value of T at a point x € A. 
The map T is said to be a contraction if there is p € [0, 1) such that 


d(Tx, Ty) < pd(x, vy), x, ye xX. 
Example 12.5 Let X = [0,1], d be the Euclidean metric restricted to (0, 1]. Let 
T: X + X be defined by Tx = x/2. Then, d(Tx, Ty) = d(x/2, y/2) = d(x, y)/2. 
so T is acontraction with p = 1/2. B 


A contraction mapping T on a metric space (X, d) must necessarily be continuous 
on that space. That is, it must be the case that for any sequence y, > yin X, we 
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have Tyn — Ty. This follows from applying the definition of a contraction: we have 
d(T yn, Ty) < pd(yn, y) > Oasn > oo. 


The importance of contraction mappings for us arises from the following powerful 
result, which is also known as the Banach Fixed Point Theorem: 


Theorem 12.6 (The Contraction Mapping Theorem) Let (X, d) be a complete 
metric space, and T: X -> X be a contraction. Then, T has a unique fixed point. 


Remark Observe that the theorem asserts two results under the stated conditions: 
(a) that a fixed point exists, and (b) that the fixed point is unique. Q 


Proof Uniqueness of the fixed point is easy to prove. Suppose x and y were both 
fixed points of T, and x # y. Then, we must have d(x, y) > 0. Since x and y are 
both fixed points, we have Tx = x and Ty = y, so d(Tx, Ty) = d(x, y). On the i 
other hand, since T is a contraction, we must also have d(T x, Ty) < pd(x, y), and 
since we must have p < 1, this means d(Tx, Ty) < d(x, y), a cdntradiction. Thus, 
T can have at most one fixed point. 

It remains to be shown that T has at least one fixed point. Pick any x € X, and let 
T?x denote T evaluated at Tx. By induction, define now T”x to be T evaluated at 
T”—'x, Using the fact that T is a contraction, we will show that the sequence {7x} 
is a Cauchy sequence in X. 

To this end, note that d(7"+!x, Tx) < pd(T"x,T"~!x). It follows that 
d(T"+!y, Tx) < p"d(Tx, x). Therefore, ifm > n, we have 


a(T™x, T"x) < d(T™x, T” !x) +- +d(T"t'x, T"x) 
< (0™7! +--+ p")d(Tx, x) 

(tp t---+p™"—!)p"d(Tx, x) 

< [1 — py 'p"d(Tx, x). 


Since x is fixed, d(T x, x) is just a fixed number. Since p < 1, it follows that by 
taking n sufficiently large, we can make d(T™x, T” x) as small as desired for any 
m > n. Therefore, {T” x} is a Cauchy sequence in X, and converges to a limit, say x*. 

We will show that x* is a fixed point of 7. Indeed, this is immediate: by the 
continuity of T, we have limn T (Yn) = T(limn yn) for any convergent sequence yn, 
so 


Tx* = T(limT”x) = lim T(T"x) = lim T'ty = x", 


The theorem is proved. a 
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It may be seen through examples that the conditions of the theorem cannot be 
weakened: 


Example 12.7 (The definition of the contraction cannot be weakened to allow for 
p = 1.) Consider R+ with the Euclidean metric. This is a complete metric space. 
Let T:R} — Ry, be defined by T(x) = 1 + x. Then d(Tx, Ty) < d(x, y), but T 
has no fixed point. o 


Indeed, an even more subtle result is true. It is not even sufficient that for cach pair 
(x, y), there exists p(x, y) < l suchthatd(Tx, Ty) < p(x, y)d(x, y). Equivalently, 
it is not sufficient that d(T x, Ty) < d(x, y) for all x, y. For an example, see the 
Exercises. 


Example 12.8 (The completeness of X cannot be weakened.) Let X = (0, 1) with 
the Euclidean metric. Let T (x) = x/2. Then T is a contraction, but X is not complete 
and T has no fixed point. $) 


12.4.3 Uniform Convergence 
Let S c R* and {fn} be a sequence of functions from S to R, We say that the 


sequence { fn} converges uniformly to a limit function f if for all € > O, there is an 
integer N (e), such that for all n > N(€), we have 


lin(x) fx) < € forallx es. 


In words, Jn converges uniformly to f if the distance between f, (x) and f(x) can 
be made arbitrarily small simultaneously for all x, simply by taking n sufficiently 
large. 

It is immediate from the definitions of uniform convergence and of the sup-norm 
that { f,} converges to f uniformly if and only if { fn} converges to f in the sup-norm 
metric, i.e., if and only if 


sup | f(x) — f(x)| > Oasn — œ. 
XES 


We will use this observation shortly to prove an important property about C (S), 
the space of continuous functions on S, endowed with the sup-norm. The following 
preliminary result, which is also of considerable independent interest, is needed first. 


Theorem. 12.9 (Uniform Convergence Theorem) Ler S C R”. Let {f,} bea 
sequence of functions from S to R such that fı > f uniformly. If the functions fi 
are all bounded and continuous, then f is also bounded and continuous. 
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Proof Boundedness of f is obvious: since f,(x) > f(x) for each x, if f is 
unbounded, then the functions fy must also be. To show continuity of f at an arbitrary 
point x € S we need to show that for all € > O, there is ô > O such that y € S and 
lix — yll < 3 implies | f(x) ~ SQ) < €. 

So lete > 0 be given. For any k, the triangle inequality implies 


(f(x) — fO < (£0) — AO 4A) -— AOI +ILAO) — f). 


Pick N sufficiently large so that for all n > N, | f(z) — f < €/3 forall z € $. 
Since fy is a continuous function, there is ô > 0 such that |x — yl] < 6 implies 
(fn(x) — fN < €/3. Therefore, whenever ||x — yi| < 5, we have 


If) — f(y) < (SO) — fv) + Lv) — Sno + fn) — FOI 
< €/3+€/3+€/3 
=€. 


The theorem is proved. a 


Our next result plays a significant role in establishing existence of an optimal 
strategy in stationary dynamic programming problems. 


Theorem 12.10 The space C(S) of continuous, bounded real-valued functions on 
the set S C RÝ is a complete metric space when endowed with the sup-norm metric. 


Proof Let { fn} bea Cauchy sequence in S. Pick anyx € S. Then, { f,(x)} isa Cauchy 
sequence in R. Since R is complete, this Cauchy sequence converges to a limit. Call 
this limit f(x). This defines a function f: S —> R such that f,(x) —> f(x) for each 
x € S. We will first show that { fn} converges uniformly to f. This will establish, by 
the Uniform Convergence Theorem, that f is continuous on S. 

Since { fn} is a Cauchy sequence in C (S), it is the case that given any € > 0, there 
is n(€) such that m, n > n(€) implies 


\In(x) — fn(x)| < €, x eS. 


Holding n fixed and letting m —> 00, we obtain 


I(x) - f@)| e, xes. 


This says precisely that { n} converges uniformly to S. 

By the Uniform Convergence Theorem, / is a continuous function. Since uniform 
convergence is the same as sup-norm convergence, we have shown that an arbitrary 
Cauchy sequence in C(.S) converges in the sup-norm to a limit in C(S). Thus, C(S) 
is complete when endowed with the sup-norm. m 
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We close this section with an important remark. It is necessary to distinguish 
uniform convergence from another notion of convergence in functional spaces 
—that of pointwise convergence. We say that a sequence of functions ( f,} converges 
pointwise to a limit function f if for each fixed x the following condition holds: for all 
€ > O, there exists an integer N (e€), such that forall n > N(e), [fa(x) — f(x) < €. 

Clearly a function converges uniformly ovly if it converges pointwise. The con- 
verse, however, is not true. The key distinction is that in pointwise convergence we 
fix x first and see if the sequence of real numbers { Jn (x)} converges to f(x). On the 
other hand, uniform convergence requires not only that /,(x) — f(x) for cach w, 
but also (roughly speaking) that the “rate of convergence” of fy (x) to f(x) be the 
same for all x. That is, for uniform convergence to hold, the same choice of N (€) 
must make | h(x) — f(x)| < € foralln > N(é), for all x, whereas in the definition 
of pointwise convergence, the choice of N(€) may depend on x. To illustrate this 
vital distinction, consider the following example. 


Example 12.11 Let {fn} be a sequence of functions from Ry to R4, defined by 


nx, x<I/n 

f(x) = l1, x>l/n. 
Then, for each fixed x > 0, (x) > l asn — œ, while (0) = 0 forall n, so f 
converges pointwise to f defined by f(0) = 0, f(x) = 1 for x > 0. However, fy 
does not converge uniformly to f (ìf it dìd, f would be continuous by the uniform 
convergence theorem). 0 


12.5 Existence of an Optimal Strategy 


We are now in a position to outline a set of assumptions on the SDP under which 
the existence of an optimal strategy can be guaranteed. We proceed in three stages. 
Recall that under Assumption 1, we have shown that the value function V: S + R 
satisfies the Bellman Equation at all s € S: 


V (s) = sup {r(s,a) + ôV (f(s, a))}. 
aceQ®(s) 
In subsection 12.5.1, we establish that a strategy o in the SDP is an optimal strategy 


if ar only if the total payoff W (oc) under v also meets the Bellman Equation at each 
ses: 


W(o)(s) = sup {r(s,a)+6W(c)(f(s, a))}. 
aeD{s) 


In subsection 12.5.2, we identify a particularly useful class of strategies called 
Stationary strategies. Finally, in subsection 12.5.3, we show that under suitable 
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assumptions, there exists a stationary strategy 2 such that W (sr) meets the Bell- 
man Equation, which completes the proof. 


12.5.1 A Preliminary Result 


Let B(S) denote the set of all bounded functions from S to R. Note that under 
Assumption 1, we have V € B(S). Endow B(S) with the sup-norm metric, i.e., for 
v,w € B(S), let the distance d(v, w) be given by 


d(u,w) = suplu(y) — w(y)}. 
yeS 


Lemma 12.12 B(S) is a complete metric space when endowed with the sup-norm 
metric. 


Proof Let {v,,} be a Cauchy sequence in B(S). Then, for each y € S, {u,(y)} is a 
Cauchy sequence in R. If we let v(y) denote the limit of this sequence, we obtain a 
function v: S > R. Since {v,} is bounded for each n, v is bounded, it is the case that 
v € B(S). That {vn} converges uniformly to v is established along the lines of the 
proof of Theorem 12.10 on the completeness of C(S) under the sup-norm metric. 

a 


Now, define a map T on B(S) as follows: for w € B(S), let Tw be that function 
whose value at any s € S is specified by 


Tw(s) = sup {r(s,a)+dw(f(s, a))}. 
aeP(s) 

Since r and w are bounded by assumption, it follows that so is Tw, and therefore 

that Tw € B(S). So T maps B(S) into itself. We first establish an important result 

concerning the fixed points of 7. 


Theorem 12.13 V is the unique fixed point of the operator T. That is, if w € B(S) 
is any function satisfying 


w(s) = sup {r(s,a) + dw( f(s, a))} 
aéP(s) 


ateachs € S, then we must have w = V. 
Proof That V is a fixed point of T is immediate from the definition of T and 


Theorem 12.4. We will show it is the only fixed point of 7, by showing that 7 is a 
contraction on B(S). Since B(S) has been shown to be a complete metric space, the 
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Contraction Mapping Theorem will then imply that 7 has a unique fixed point on 
B(S). 
The proof that 7 is a contraction is simplified by the following lemma: 


Lemma 12,14 (Contraction Mapping Lemma) Let L: B(S) —> B(S) be any map 
that satisfies 


1. (Monotonicity) w > v > Lw > Luv. 
2. (Discounting) There is P € (0,1) such that L(w + c) = Lw + fe for all 
we B(S)andce R. 


Then, L is a contraction. 


Remark Like most shorthand notation, the notation in the lemma is sloppy. First, 
w > v means w(s) > v(s) for all s € S. Secondly, for w € B(S)andc e R, w +c 
is the function which at any s assumes the value w(s) +c. Under this interpretation, 
it is clearly the case that whenever w € B(S), we also have (w +c) € B(S), so 
L(w + c) is well defined. 


Proof of the Contraction Mapping Lemma Let v,w € B(S). Clearly for any 
s € S, we have 


w(s)— v(s) < supļw(s) — v(s)| = |[w — vl]. 
seS 
So w(s) < v(s) + ||w — ul]. Applying monotonicity and discounting in order, we 
now have 
Lw(s) < Lu + llw — vis) < Lu(s)+ gliw — vli, 


so that [Lw(s) — Lu(s)] < Bllw — vi]. Now, v(s) — w(s) < |w — vli, so going 
through the same procedure yields 


Lu(s) < L(w+ lw ~ vls) < Lw(s)+ Bilw — vll, 


or [Lu(s) — Lw(s)] < ilw — v|. Combining this with the previous inequality, we 
obtain 
\Lw(s) — Lu(s}| < llw — vl, 


and so sup,¢5|Lw(s)— Lu(s)| < Bl|w— ui]. Equivalently, ||Lw — Lvi < Bllw — vll. 
Since £ € [0, 1), we are done. o 


To complete the proof of Theorem 12.13, we apply the Contraction Mapping 
Lemma to the mapping T. Note that T trivially satisfies monotonicity (a larger 
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function can only give a higher supremum). Furthermore, 
T(w + c)(s) = sup{r(s, a) + 8(w + c)(f(s, a))} 


sup{r(s, a) + dw( f(s, a)) + êc} 
= Tw + de. 


Il 


Since ô € [0, 1) by hypothesis, T satisfies the discounting property also. Therefore, 
T is a contraction on BCS), and the theorem is proved. o 


We can now establish the main result of this subsection. 


Theorem 12.15 Under Assumption 1, a strategy o in the SDP is an optimal strategy 
if, and only if, W (o) satisfies the following equation at each s € S: 
W(o)(s)= sup {r(s,a) +8W (a )X( f(s, a))}. 


ac®(s) 


Proof First, suppose that W (o ) satisfies the given equation at each s € S, Since r 
is bounded by Assumption 1, W (o) is evidently bounded. But then we must have 
W (a) = V since (by Theorem 12.13) V is the unique bounded function that satisfies 
this equation at all s. And, of course, W (o ) = V says precisely that ø is an optimal 
strategy. 

Now suppose ø is an optimal strategy. By definition of V, we must have W (0) = 
V , so we certainly have 


W(a)(s) = sup {r(s, a) +8W (o)( f(s, a))}, 
ac¢®(s) 


atalls € S, as required. o 


By Theorem 12.15, the SDP will be solved if we can demonstrate the existence of 
a Strategy o such that W (o ) satisfies the Bellman Equation at each s € S. We do this 
in subsection 12.5.3 after first identifying an especially simple and attractive class of 
Strategies. 


12.5.2 Stationary Strategies 
The aim of this section is to single out a particularly useful class of strategies, the 
class of stationary strategies. The definition of stationary strategies is motivated 
by the observation that any t-history h; = (s1,4@1,..., sr) in a stationary dynamic 
programming problem (5, A,r, f, P, 4} from the initial state s = s1, simply results 
in the same stationary dynamic programming problem {S, A,r, f, ®, 5}, but with 
initial state s = s;,. Intuitively, this appears to imply that there is no extra gain to be 
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made by conditioning the strategy on anything more than the current state, and nat 
even the date (i.e., the time period) on which this state was reached. A strategy which 
depends solely on the current state in this fashion is called a stationary Markovian 
strategy, or simply a stationary strategy. 

For a more formal definition, a Markovian strategy o for the SDP is defined to be 
a strategy where for each £, o; depends on A, only through ¢ and the period-r state 
under hy, s;[/;]. Effectively, a Markovian strategy may be thought of as a sequence 
{7} where for each t, 2; is a mapping from S to A satisfying z7,(s) € ®(s) for each 
s. The interpretation is that 2, (s) is the action to be taken in period ¢, if the state at 
the beginning of period ż is s. 

A Stationary strategy is a Markovian strategy {7,} which satisfies the further con- 
dition that x; = x, (= x, say) for all ¢ and r. Thus, in a stationary strategy, the 
action taken in any period ¢ depends only on the state at the beginning of that period, 
and not even on the value of £. It is usual to denote such a strategy by 7‘, but, for 
notational simplicity, we shall denote such a strategy simply by the function x. 

Finally, a stationary optimal strategy is a stationary strategy that is also an optimal 
strategy. 


12.5.3 Existence of an Optimal Strategy 


We have already assumed that: 


Assumption 1 r is bounded on S x 4. 


We now add the following assumptions: 
Assumption 2 r is continuous on S x A. 
Assumption 3 fis continuous on S x A. 


Assumption 4 © is a continuous, compact-valued correspondence on S. 


Under Assumptions 1-4, we will show the existence of a stationary strategy 7* 
such that W (7") meets the Bellman Principle of Optimality at all s € S. Our deriva- 
tion of this strategy is carried out in a series of three steps. The content of these steps 
is as follows: 


Step 1 We first show that there is a unique continuous function w*: S —> R such 
that w* meets the Bellman Principle of Optimality at all s € S: 


w*(s) = max {r(s,a) + dw*(f(s,a))}. 
ace P(s) 
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Step2 Define G*: S > P(A) by 


G*(s) = arg max (w*(s) = r(s, a) + bw*(f(s, a))). 


We show that G* is well defined and admits a selection 7*, i.e., there is a function 
m*:S — A satisfying m*(s) €e G*(s) for all s e S. The function 2* defines a 
stationary strategy which satisfies, by definition, 


w*(s) = r(s,x*(s)) + dw*[f(s, 2*(s))], 
foralls € S. 


Step 3 Finally, we shall show that the total discounted reward W (7r *)(s) under the 
stationary strategy 7r * defined in Step 2, from the initial state s, satisfies 


Wir*)(s) = w*(s). 


By Step 1, therefore, W (x*) is a fixed point of the mapping T. Thus, by Theo- 
rem 12.15, 2* is a stationary optimal strategy. 


Step I 


Let C(S) be the space of all real-valued continuous functions on S. Endow C (S) with 
the sup-norm metric. We have already seen that C (S) is then a complete metric space. 
Now, define the function T on C(S) as in subsection 12.5.1. That is, for w € C(S), 
let Tw be the function whose value at any s € S is given by: 


Tw(s) = max {r(s,a) + dw( f(s, a))}. 
aeéP(s) 


Tw is evidently bounded on S since w and r are bounded functions. Morcover, 
since w is continuous and f is continuous, w o f is continuous as the composition 
of continuous functions. By assumption, r is continuous on S x A, therefore the 
expression in parentheses on the RHS is continuous on S x A. For fixed s, P(s) is 
compact by hypothesis, so the maximum is well defined. The Maximum Theorem 
now implies that Tw is also continuous on S. Therefore: 


Lemma 12.16 T maps C(S) into C(S). 


For w, v € C(S), it is immediate from the definition of T that w > v implies 
Tw > Tv, since a larger function can only give a larger maximum. Moreover, 
for any w € C(S) and c € R, we clearly have (w +c) € C(S), and, moreover, 
T(w+c) = Tw +c. By mimicking the proof of the Contraction Mapping Lemma, 
itis easy to show that 


Lemma 12.17 T:C(S) —> C(S) is a contraction. 
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The following result, which obtains by combining these results, completes Step 1>- 


Lemma 12.18 T has a unique fixed point w* € C(S). That ts, there is a unique 
w* € C(S) that satisfies the following equation at each s € S: 


w*(s) = max {r(s,a) + dw*(f(s,a))}. 
ac®{s}) 


Step 2 
Define G*: S — P(A) by 


G*(s)=arg max {r(s,a)+ ôw * (f(s,a)) = w*(s)}. 
aed(s) 


By the Maximum Theorem, G* is a (nonempty-valued) usc correspondence. Thus, 
there is a function 7*: S — A such that for each s € S, x*(s) €e G*(s) C P(s). 
The function 7* defines a stationary optimal strategy, that by definition satisfies at 
alls € S, 

w* (s) 


r(s, n*(s)) + dw*{ f(s, x*(s))] 
r(s,a)+dw*[f(s,a)}, ae ®(s). 


j 


Iv 


This completes Step 2. 


Step 3 


Define x * as in Step 2, and pick any initial state s € S. Recall that {s;(7*, s), a, 0t *, s)} 
denotes the sequence of states and actions that result from s under 7 *; and that 
r;(*)(s) = r[sy(*, s), a;(1*, 5)] is the period-t reward under x * from s. For 
notational ease, let ss = s(x *, s) and a, = a(n *, s). By definition of W (7 *), we 
have 

fo ¢) 

W(x*)(s) = Do êri (*)(s). 
1=0 


On the other hand, by Step 2, we also have of Ly, 


w*(s) = r(s,m*(s)) + dw" | f(s, 27 (s))] 

= r(s,*(s)) + dw" (sı) 
r(s,m*(s)) + ôr(s1, a1) + d°w"[f(s2)] 
ro(x*)(s) + dri (*)(s) + d?w*[s2(™, 5)]. 


Iterating, we obtain for any integer T: 


T-1 
w*(s) = > 8'r,(x*)(s) +87 w"[s7(", 5). 
t=0 
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Since w* is bounded and 6 < 1, letting T —> oo yields 


(e e] 
w*(s) = D ry (t*)(s), 
t=0 


so w* = W (n *). Since Step | established that 


w*(s) 


l 


REG: a) +dw* (f(s, a))}, 
it follows that 
W(x*)(s) = max {r(s, a) +dW(2")(f(s, a))). 
aeP(s) 


By Theorem 12.15, this equation establishes that x* is an optimal strategy. Since 
m * is also stationary, we have shown that a stationary optimal strategy exists under 
Assumptions 1—4. We summarize in the following theorem. 


Theorem 12.19 Suppose the SDP {S, A, ®, f, r, 8} satisfies the following condi- 
tions: 


1. r:S x A — R is continuous and bounded on S x A. 
2. f:S x A — Sis continuous on S x A. 


3. $: S —> P(A) is acompact-valued, continuous correspondence. 


Then, there exists a stationary optimal policy n*. Furthermore, the value function 
V = W(x*) is continuous on S, and is the unique bounded function that satisfies 
the Bellman Equation at each s € S: 


W(x*)(s) = max {r(s,a) + dW (2*)( f(s, a))} 
aes) 


= r(s,*(s)) + dW(x")[ f(s, 7 * (s))]. 


12.6 An Example: The Optimal Growth Model 


The one-sector model of optimal growth is a very popular framework in neoclas- 
sical economics. It studies the problem of a single agent (a “social planner”) who 
maximizes discounted utility from consumption over an infinite horizon subject to 
technological constraints. It offers an excellent illustration of how convexity condi- 
tions can be combined with the continuity conditions on the primitives to provide a 
very sharp characterization of the solution. 
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~12.6.1 The Model 


The basic model of optimal growth is very simple. There is a single good (the 
metaphorical com of neoclassical theory) which may be consumed or invested. The 
conversion of investment to output takes one period and is achieved through a pro- 
duction function f.R4 — R4. Thus, if x; denotes period-t investment, the output 
available in period-(¢ + 1), denoted y;41, is given by f(x,). The agent begins with 
an initial endowment of y = yo € R++. In each periods = 0, 1,2,..., the agent 
observes the available stock y; and decides on the division of this stock between 
period-t consumption c; and period-r investment x;. Consumption of c, in period t 
gives the consumer instantaneous utility of u(c,;) where u: Ry. — R is a utility func- 
tion. The agent discounts future utility by the discount factor 5 € [0, 1), and wishes 
to maximize total discounted utility from lifetime consumption. Thus, the problem 
is to solve: 


oO 
Maximize om 5 u(cr) 


1=0 
subject to yor y 
Mar = fOr), ¢=0,1,2,... 
Cr +X, = ye, t=0,1,2,... 
Cr, Xt = 0, E p15 2s ke 


The optimal growth model may be cast in a dynamic programming framework 
using the following definitions. Let S* = Ry be the state space, and A* = Ry be 
the action space. Let @(y) = [0, y] be the feasible action correspondence taking 
states y € S* into the set of feasible actions [0, y) C A* at y. Let r(y, c) = u(c) 
be the reward from taking the action c € ®(y) at the state y € S*. Finally, let 
F(y,¢) = f(y — c) be the transition function taking current state-action pairs (y, €) 
into future states F(y, c). The tuple {S*, A*, &, r, F,ê} now defines a stationary 
discounted dynamic programming problem, which represents the optimal growth 
model. 

We are interested in several questions conceming this mode]. These include: 


1. Under what conditions on u and f do optimal strategies exist in this model? 

2. When is the optimal strategy unique? 

3. What can one say about the dynamic implications of the optimal strategy? For 
instance, letting {y;, c;} denote the evolution of state-action levels under the 
optimal strategy from an arbitrary initial state (i.e., stock level) yo: 


(a) Will the sequences {yr} and {cr} be monotone, or will one (or both) exhibit 
cyclical tendencies? 
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(b) Ifthe sequences are monotone, what are the properties of their limiting values 
y* and c*? In particular, are these limiting values independent of the initial 
State yo (that is, are the “long-run” implications of growth independent of 
where one Starts)? 

(c) Is it possible to use analogues of first-order conditions to characterize be- 
havior on the optimal path? 


In subsection 12.6.2 below, we tackle the question of existence of an optimal 
strategy. We show that under minimal continuity conditions, and a boundedness 
condition on f that enables compactification of the state space, it is possible to show 
that stationary optimal strategies exist. 

The characterization of optimal strategies is taken up in subsection 12.6.3 under 
added strict convexity assumptions on f and u. We show that these assumptions 
carry several strong implications, including the following:> 


1. If u is strictly concave, the optimal level of savings increases with stock levels. 
Therefore, the sequence of states { y,} that results from any initial state under the 
optimal plan is monotone, and converges to a limiting value y*.4 

2. If f and u are both strictly concave: 


(a) The optimal consumption sequence {c;} from any initial state is also mono- 
tone, and converges to a limit c*. 

(b) The limiting values y* and c* are independent of the initial state; curiously, 
they are even independent of the properties of the utility function u, and 
depend only on the properties of the technology f and the discount factor 
55 

(c) The right-hand side of the Bellman Equation is strictly concave in y and 
c, SO a unique maximizing action exists for each y. As a consequence, the 
optimal strategy is unique. 

(d) If f and u are also continuously differentiable, then the optimal path may 
also be characterized using a dynamic analogue of first-order conditions 
known as the Ramsey—Euler Equations. 


12.6.2 Existence of Optimal Strategies 


We cannot directly appeal to Theorem 12.19 to establish existence of optimal strate- 
gies in this problem, because u may be unbounded on R+. Rather than impose 


4The order in which results are listed here is not the order in which they are proved. Some of the results 
listed later here—such as the Ramsey~Euler Equation—are used to prove some of those listed earlier 
(such as the monotonicity of the sequence {c;}). 

41t must be emphasized that this result does not depend on f having any convexity properties. 


5This is true only of the limiting values y* and c*. The path by which {yr} and {cr} converge to y* and c* 
does depend on u. 
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boundedness on the framework as-an assumption, we consider a more natural and 
plausible restriction which will ensure that we may, without loss of generality, restrict 
S* and A* to compact intervals in R,, thereby obtaining boundedness of u from its 
continuity. 


Assumption 1 The production function f satisfies the following conditions: 


1. (No free production) f(0) = 0. 

2. (Continuity and Monotonicity) f is continuous and nondecreasing on R4. 

3. (Unproductivity at high investment levels) There is x > 0 such that f(r) < x 
for all x > x. 


Parts | and 2 of this assumption are self-explanatory. Part 3 can be justified as a 
version of diminishing marginal returns. 

We assume that the initial state yo lies in some compact interval [0, >] of R4. De- 
fine y* = max{x, y}. Then, by Assumption 1, if y € (0, y*], we have f(x) € [0, 47] 
for all x € (0, y], and we may, without loss of generality, restrict analysis to (0, v"J. 
We now set 


S = A = [0, y". 


Secondly, we make the usual continuity assumption on the reward function: 
Assumption 2 u:R, — R is continuous on R4. 


The tuple {S, A, $, r, F, 5} now meets the requisite compactness and continuity 
conditions to guarantee existence of an optimal strategy, and we have: 


Theorem 12.20 There is a stationary optimal strategy g:S — A in the optimal 
growth problem under Assumptions 1 and 2. The value function V is continuous on 
Sand satisfies the Bellman Equation at each y € S: 


V (y) = max {u(c) + 6V[f(y —c))}} 
cefo. y] 


uO ESKIA = goh. 


12.6.3 Characterization of Optimal Strategies 


What more can we say about this problem without making additional assumptions? 
Consider two different initial states y and y’ with y < y’. Let c = g(y) be an optimal 
action at y. Then, c is a feasible (but not necessarily optimal) action at y’, since 


(y) = [0,y] > 0, y] = $). 
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Moreover, since f is nondecreasing, consuming c at y’ results in a period-1 stock 
level of f(y’ — c) = f(y — c). Thus, in the continuation also, the optimal action 
at f(y — c) will be feasible (but not necessarily optimal) at f(y’ — c). It follows 
by induction that the entire sequence of actions {cr} that results from y under the 
optimal strategy g is feasible (but not necessarily optimal) from y’. Therefore:® 


Theorem 12.21 V:S — R is nondecreasing on S. 


Without additional structure, it is not possible to further characterize the solution; 
we proceed therefore to make assumptions of increasing degrees of restrictiveness 
on the structure. 


Assumption 3 u: Ry — R is strictly increasing on R+. 


Assumption 4 u: IR, — R is strictly concave on R4. 


Note that under Assumptions 1—3, Theorem 12.21 can be strengthened to the state- 
ment that V is a strictly increasing function. When Assumptions 1-3 are combined 
with the curvature condition in Assumption 4, we obtain a very strong conclusion: 
namely, that the marginal propensity to save must be nonnegative. That is, if the stock 
level increases from y to y’, then the optimal level of savings that results at y’ must 
be at least as large as the optimal level of savings that results at y. 

More formally, let (vy) = y — g(y) denote the optimal savings (or investment) 
at y under the optimal strategy g. Since choosing the consumption level c at y is 
equivalent to choosing the investment level x = y — c, we must have 


Viy) = max {u(y—x) + SVIS). 
x0, y] 


Theorem 12.22 Under Assumptions 1-4, § is nondecreasing on S. That is, y, y' € S 
with y < y' implies (y) < E(y’). 


Proof Suppose not. Then, there exist y, y’ € S with y < y’ and &(y) > &()”). For 
case of notation, let x and x’ denote (y) and &(y’), respectively. 


6 An alternative, and somewhat lengthier, way to prove this result is to consider the set C*(S) of bounded, 
continuous, and nondecreasing functions on S with the sup-norm. This space is complete, and a simple 
argument shows that the map T* defined on C*(S) by 


T*w(y) = max(u(c) + dw[f(y—c)} lc € (0, y}} 


is a contraction that maps C*(S) into itself. (We use the monotonicity of w and f to show that T*w is a 
nondecreasing function on S.) Therefore, 7* has a unique fixed point V*. Since V* e C*(S). V* must 
be continuous. But V is the unique continuous function that satisfies the Bellman Equation. Therefore, we 
must have ¥* = V, so V is nondecreasing. 
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Since y > y>x,xisa feasible level of savings at y’. Similarly, since y'> x: 
and x > x’ by hypothesis, x’ is a feasible level of savings at y. Since x and x’ are 
the optimal savings levels at y and y’, respectively, we must have 


V(y) = u(y — x) + 8V (Ff) 
> uly = x) + dV (f(2')). 

Vy) = u(y’ — x) + Vf’) 
> u(y’ — x) + dV (f{x)). 


From these inequalities we obtain: 


u(y- x) — uly =x") > SVS- Vf (x) > uly’ — x) — uly = x’). 
and so 
u(y =x’) — u(y- x) < uly’ — x’) ~ uy’ — x). 
But (y — x’) and (y — x) are the same distance apart as (y’ — x’) and (y' — x). 
However, y < y’, and u is increasing and strictly concave, which means we must 


have u(y—x’) —u(y—x) > u(y! —x’) —u(y’ —x), a contradiction. This establishes 
the theorem. a 


We now make an additional convexity assumption. Assumptions 1 ~5 describe the 
framework known as the concave one-sector growth model. 


Assumption 5 f is concave on R4. 


Note that we do not (yet) require strict concavity of f. Nonetheless, this is now 
sufficient to prove two strong results: the concavity of the value function V, and 
using this, the uniqueness of the optimal strategy g. The second result is an especially 
significant one. 


Theorem 12.23 Under Assumptions 1—5, V is concave on S. 
Proof Let y, y’ e€ S with y Æ y’. Pick à © (0, 1) and let yy = Ay + (1 — A)’. We 
need to show that 

VOD) = AV(y) + 1 - AV’). 


Let {cr} and {c;} be the optimal consumption sequences from y and y’ respectively, 


and let {y} and {y;} denote the respective sequences of stock levels that arise. Of 
course, we have 


cr < yi and c, < y, 
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for all ¢. Now define for: =0,1,2,..., 
co = he +(1 —A)e}: 
We will establish that the sequence {c+} is a feasible consumption sequence from y). 
Let {x*} denote the sequence of investment levels that will arise if {c}} is followed 
from y,. The feasibility of {cà} from y, will be esiablished if we can show that 
Ya > ch and for t Ob, AEN 
SOP) > eha. 
The first of the required inequalities holds since 
ya = ày+(1—A)y > Acot (1A) = cp. 


Using xg = ya — cÀ and the concavity of f, we also have 


fxd) = FUQ- tU -A — eol 
> Af(y— co) +A- co) 
> hey + (1 Ade} 


= 


The obvious induction argument now completes this Step. 
Since {cÀ} is a feasible, but not necessarily optimal, consumption sequence from 
ya, and since u[Ac, + (1 — À)ci} > Au(cr) + (1 — A)" (C;) for each z, we have 


V 


Von) = J sulcà) 
=0 


co 
Youle + (= Ades] 
t=0 


> AD d'ulc) +A — a) DS ule}) 
t=0 t=0 


=AV(y) + (1 - AV y’) 
which completes the proof. a 
Theorem 12.24 Under Assumptions 1-5, the correspondence of maximizers G of 


the Bellman Equation is single-valued on S. Therefore, there is a unique optimal 
strategy g, and g is a continuous function on S. 
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Proof By hypothesis, u is strietly-concave and f is concave. By Theorem 12.23; V 
inherits this concavity. Thus, the RHS of the Bellman Equation 


{u(c) + VI f(y —¢)]} 


is strictly concave as a function of c. The single-valuedness of G follows. As a 
single-valued correspondence, G admits a unique selection g; since G is a USC Cor- 
respondence, g must be a continuous function. © 


It is very important to emphasize that this result shows the uniqueness Of the 
optimal strategy itself, not just that there is a unique stationary optimal strategy. We 
leave it as an exercise to the reader to explain why this is the case.” 

Finally, we add the following differentiability assumptions and obtain the differ- 
entiable concave model: 


Assumption 6 u is C! on Ry. with limeyo u/(c) = 00. 
Assumption 7 fis C! on R44 with lim, yo f'(x) > 1/6. 


Assumption 8 f is strictly concave on R}. 


The assumptions that u’(0) = +00 and f'(0) > 57! are called the Inada con- 
ditions on u and f, respectively. They ensure essentially that the agent will try tc 
keep consumption levels strictly positive in all periods. 1f consumption in period- is 
positive and in period-(t + 1) is zero, the agent can gain in total utility by transferring 
a “small” amount from period-t to period-(t + 1): the fall in marginal utility in period 
t will be more than compensated for the gain in marginal utility in period (¢ + 1). 
However, this argument is not quite complete, since f may be so “unproductive” as 
to make transferring any amount across periods unattractive. The Inada assumption 
on f rules out this possibility. The following result formalizes these ideas: 


Theorem 12.25 Under Assumptions 1-7, it is the case that for all y > 0, we have 
0 < g(y) < y, i.e., the solution to the model is “interior.” 


Remark It is important to note that the proof of this result makes no us€ of the 
concavity of f. o 


7The following is a sketch of the required arguments. We have seen that a strategy 7 in a dynamit program- 
ming problem is optimal if and only if the payoff W (x) under 7 satisfies the Bellman Equation. Therefore, 
the actions prescribed by 7 at any state must solve the Bellman Equation. Since V is the only solution 
to the Bellman Equation, it follows that if there is a unique action maximizing the right-hand Side of the 
Bellman Equation, then there is also a unique optimal strategy. 
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Proof Pick any y > 0. Let x = y — g(y) be the optimal investment at y, and 
x’ = y' — g(y’) be the optimal investment at y’ = g(y — g(y)), i.e., in the period 
following y. Then x must solve the following two-period maximization problem: 
max {u(y — z) + du(f(z) — x’)}. 
ze(0,y] 

The reason is simple: suppose x was dominated in this problem by some z. Then, by 
consuming (y — z) at the state y, and f(z) — x’ at the state f(z), the utility over the 
first two periods beginning from y is strictly larger than following the prescriptions 
of the optimal plan in these two periods. But in either case, the investment at the end 
of the second period is x’; therefore, the continuation possibilities after the second 
period are the same in either case. This means x could not have been an optimal 
investment level at y, a contradiction. 

Now, if it were not true that the solution x to this two-period maximization problem 
lies in (0, y), then we must have x = 0 or x = y. 


Case 1: (x = 0) 


The first-order conditions for a maximum in this case are: 
Su (fŒ) = x’) f'E) < u(y — x). 


If x = 0, then f(x) = 0, so x’ = 0. But these conditions then reduce to the 
contradiction 


+oo = ôu'(0) f'(0) < u'(y) < u'(0) = +00. 


Case 2: (x = y) 


In this case, the FOC’s are (using x = y) 
5u'( f(y) — x’) f"(y) > u'(0). 


This is not possible unless x’ = f(y). But x’ = f(y) would similarly be impossible 
as an optimal choice unless x” = f(y’), where y” = f(x’), and x” = y" — g(y"), 
and so on. Thus the only way this situation can arise is if on the entire path from 
y, we have zero consumption. But this is evidently suboptimal since u is strictly 
increasing. o 


Thus, Cases 1 and 2 are both impossible, and the result is established. o 


As an immediate consequence of Theorem 12.25, we obtain the Ramsey—Euler 
Equation, the first-order condition for optimality in the one-sector growth model: 
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Theorem 12.26 (Ramsey-Euler Equation) Suppose Assumptions l-7 are. met 
Let y > 0, and y* = f(y — g(y)). Then, g satisfies 


wigo) = bu’[e(y*) ST — gO). 


Proof We have shown that at each y, g(y) must solve 
max {u(c) + dul f(y- ¢) — x’), 
cel0, y] 


and that the solution to this problem is interior. The Ramsey-Euler Equation is simply 
the first-order condition for an interior optima] solution to this problem. 0 


Remark The Ramsey—Euler Equation is often expressed more elegantly as fol- 
lows. Let y > 0 be any given initial state, and let (c,;} and {x,} denote the optimal 
consumption and savings sequences that arise from y. Then, it must be the case that 
at each ¢: 


u'(ci) = bul (crpi) f x). 0) 


Using the Ramsey—Euler Equation, it is now an easy task to show that the sequence 
of consumption levels from any initial state must also be monotone: 


Theorem 12.27 Under Assumptions 1-7, g is increasing on S. Thatis, y > yimplies 
aly) > 8&0). 


Proof For notational ease, let c = g(y) and ĉ = g(y). Let yı and ĵ; denote, 
respectively, the output levels that result from y and y one period hence, i.e., yı = 
S(y—c) and ĵı = f(p—C). Finally, let cy = g(y1) and ¢; = g(j1). By the Ramsey 
—Euler equation, we have 


u'(c) = bu'(c\) f y — c) 
u'(ĉ) = bu’(é\) f(y — ô). 


(<2) = (ee >) 
w) Lue f'(y~- 6]? 
Suppose c < ĉ. Then, since u is strictly concave, u’(c) > u’(¢). Moreover, y — c > 
y—é,so f'(y— c) < f'(y— ĉ). Therefore, we must have u/(c}) > u'(ĉi), or 
cy < ĉi. 

In summary, we have shown that if y > f and c < ĉ, then these inequalities 


are repeated in the next period also; that is, yı > ĵı and c; < ¢). Iterating on this 
argument, it is seen that the sequence of consumption levels from y is dominated in 


Therefore, 
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every period by the sequence of consumption levels from y’. This means the total 
utility from y is strictly smaller than that from y’, a contradiction to the fact that V 
is Strictly increasing. o 


Finally, define a steady state under the optimal policy g to be a state y € S with 
the property that 


y = f(y- egy). 


That is, y is a steady state under the optimal policy, if, whenever the initial state is y, 
the system remains at y forever under the optimal policy. Also define the golden-rule 
state ys to be equal to f (xï) where xf is the unique solution to 


ôf x) = 1. 


Finally, define the golden-rule consumption level cf to be that value of c that would 
make ys a steady state, i.e., 


ch = y- x. 
Note that y;, x3, and cf are all independent of the utility function u, and depend only 


on f and the value of ô. 


Theorem 12.28 Given any y € S, define the sequence of states {y;(y)}72o from y 
under the optimal policy g as yo(y) = y, and fort > 0, yrai(y) = flydy) - 


g(yr(y))]. Let (c;(y)} denote the corresponding sequence of consumption levels. 
Then, 


y) —> yg and ciy) > c$. 
Proof For notational ease, fix y and suppress all dependence on y. Since the invest- 
ment function € is nondecreasing on S, and since f is also a nondecreasing function, 
the sequence {),} is a monotone sequence. Since g is nondecreasing, the sequence 


{ci} is also monotone. Therefore, both the sequences have limits, denoted, say, y* 
and c*. Since y,41 = f(r — ¢r) for all £, we must have 


y* = So" KS c*) 
by the continuity of f. Moreover, since 
u(y) = bul (asf nr —¢c), 


the continuous differentiability of f and u implies that in the limit u’(c*) = 
bu'(c*) f’(y* — c*), or 


1 = 8f'(y*—c*). 
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Thus, y* and c* are also solutions to the two equations that define yf and cf Since 
these solutions are unique, we are done. D 


12.7 Exercises 


1. Let (xn) be a Cauchy sequence. Show that if (x„) has a convergent subsequence 
then the sequence is itself convergent. 


2. Determine whether or not the following subsets of R are complete: 


(a) [0, 1} 
(b) Q 
(c) Q*°, the set of irrational numbers 
(d) N 
(e) {1, 1/2, 1/3,...,1/n,...] 
(f) (1, 1/2, 1/3,....1/n,...} U {0} 
(g) {-1,2) 
3. Let f:R — R be defined as f(x) = ax + b. What are the values of a,b € R 
that make f a contraction? 


4. Let X = (1, œ). Let f: X — R be given by 


(a) Show that if a € (1,3), then f maps X into itself, i.e., f(x) € X for all 
xeXx. 


(b) Show that f is actually a contraction ifa € (1, 3). Find the fixed point as a 
function of a for each a. 


(c) What about a = | ora = 3? 
(d) Is X complete? 


5. Let f:R — R be defined by 
x — je, 
I=) i a 
~zt5x, x> 0 
Show that f satisfies the condition that | f(x) — f(y)| < |x — yl forall x, y € R. 
but that f has no fixed points. 


6. Let X be a finite set, and fn: X —> R. Suppose fn converges pointwise to f, ie. 
for each x € X, fa(x) > f(x). Show that fp converges uniformly to f. 
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7. Let X = [0, 1]. For n € N define fn: X > R by 


f nx, x<1/n 
foes =| 


l, x > l/n. 
(a) Show that fn converges pointwise to f where f is given by 
0, x=0 
l, x>0 


ra| 


i.e., show that for each x € X, f,(x) > f(x). 
(b) Let d be the sup-norm metric. What is d (Jn, f)? Show that d( fn, f) does 
not converge to 0, i.e., f, does not converge uniformly to f. 


8. Let X = (0, 1] and let for each n e N define fn: X > Ras 
fha) = x". 
Show that fn converges pointwise to f: X — R defined by 
0 x<i 
Tos | 1 wsk 
Does fn converge uniformly to f? 
9. For eachn € N let fh: R — R be defined by 
l1- Hxl, |x| <n 
0, |x] >n. 


se =| 


Show that fa converges pointwise to the constant function f(x) = 1. Does fn 
converge uniformly to f? 


10. Let X C Ry be a compact set and let f: X —> X be a continuous and increasing 
function. Prove that f has a fixed point. 


1 


— 


. Let X = (0, 1], and f: X —> X be an increasing but not necessarily continuous 
function. Must f have a fixed point? 

12. If R > R and g:R — R are both contraction mappings, what can we say 
about f o g? 

13. Is it possible to have two discontinuous functions whose composition is a con- 
traction mapping? 

14. If f:R > R isa differentiable contraction mapping, what can we say about the 
value of df'(x)/dx? 

15. A function F: X — Y is said to be an onto function, if for any y € Y, there is 

some x € X such that f(x) = y. Show that the mapping f(x) = (1 — x)!/2 is 

an onto mapping from (0, 1] to (0, 1]. Find the fixed points of this mapping. 


16. 


17. 


20. 


21. 


22. 
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Let a mapping of the interval [0, I] onto itself be defined by f(x) = 4x — dx? 


(a) Sketch the graph of the function and the diagonal line y = x. 
(b) Is the mapping one-to-one in the interval? 
(c) Find the fixed points of the mapping. 


Give an example of a mapping of the interval (0, 1] into itself having precisely 
two fixed points, namely 0 and 1. Can such a function be a contraction mapping? 


. Give an example of an onto function of the open interval (0, 1) into itself having 


no fixed points. 


. Let X = [0, 1]. Find an function f: X + X which has 0 as its only fixed point, 


cz show that no such example is possible. 


Le: S C R”. Let BU(S) be the set of all bounded upper-semicontinuous func- 
tions w» mapping S into R. Give BU(S) the sup-norm metric and show that 
BU(S) is a complete metric space under this metric. 


Let S = {0,1}, A = R4, and O(s) = A for any s € S. Let the state transition 
function be given by 


0 ifs =OQora < 1/2 
f(a) = ; 
l ifs =landa > 1/2. 
Letr:S x A —» R be given by 
—a ifs=0 
r(s,a) = ; 
o ifs=1. 


(a) Show that f is not continuous on S x A. 
(b) Suppose 6 = 0, i.e., the future is worthless. Then, the problem is simply to 
maximize r(s,a) for each s € S. What is the solution? What is V (0)? V (1)? 


(c) Suppose ô € (0,1). Show that a solution exists. Find the solution as a 
function of ô. 


(d) What “should” the solution be for 6 = 1? 
(e) Suppose that f is modified to 


0 ifs =Oora < 1/2 


(s,a} = . 
f : ifs = l anda > 1/2. 


Does a solution exist for all 5 € (0, 1)? Why or why not? 


A wholesaler faces a known demand of k units per period for its product, at 
a given price p. At any given time, a maximum of / € N (/ > k) units of the 
product can be ordered from the manufacturer at a cost of c per unit. There is also 
a fixed cost Z > 0 of placing the order. Ordered amounts are delivered instantly. 
Any amount of the product that remains unsold in a given period can be stored 
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23. 


24. 


25. 
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at a cost of s per unit. Assuming that the wholesaler discounts future profits by 
a factor ô € (0, 1), and has an initial stock of x units of the product, describe the 
optimization problem as a stationary dynamic programming problem, and write 
down the Bellman Equation. Is it possible to assert the existence of an optimal 
strategy? Why or why not? 


Consider the one-sector model of optimal growth. Suppose the only restric- 
tions we place on the utility function u: Ry — R and the production function 
f:R4 > R4 are: 


(a) u and f are both continuous on R+. 
(b) f(x)e{0,1] forall x e (0, 1]. 


Let S = [0, 1]. Suppose the problem admits free disposal. That is, the agent can 
decide on the amount c; of consumption and x, of investment out of the available 
stock y;, and costlessly dispose the remaining quantity y; — c; ~ xr. Then, given 
a discount factor ô € (0, 1), the agent solves: 


foe} 
Maximize X du(cr) 
t=0 
subject to ywryeS 
Mai = S) 
Cr +x < Yı 


Cr, Xt = 0. 


(a) Does this problem have a solution? Why or why not? 
(b) Describe the Bellman Equation for this problem. 


(c) Show that the value function in (b) is nondecreasing on S. 


Suppose that in the optimal growth problem we had u(c) = c*, œ € 0, 1 and 
J (x) = x. Solve for the optimal policy g with S = [0, 1] and ô € (0, 1). 

Hint 1: Use the FOC’s for the problem. 

Hint 2: gis linear, i.e., g(y) = ay for some constant a. Solve for a. 


Redo the last question, assuming f(x) = x” for œ € (0, 1], and u(c) = log c. 
(Note that log c is unbounded at 0, so our sufficient conditions for the existence of 
an optimal strategy do not apply here. Nonetheless, it turns out that the Bellman 
Equation does hold, and a solution can be calculated using this equation. The 
value function V has the form V(y) = A + B log y. Using this on both sides of 
the Bellman Equation, solve for A and B, and thereby for an optimal policy.) 
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26. Consider the following optimization problem: 


27. 


foe) 
Maximize 5 u(a;) 
t=0 


oO 
subject to Ya <s 
t=0 
ay, > 0. 
This problem, which is a dynamic programming problem with ô = 1, is known 
as the “‘cake-eating” problem. We begin with a cake of size s, and have to allocate 
this cake to consumption in each period of an infinite horizon. 


(a) Show that the problem may be rewritten as a dynamic programming problem. 
That is, describe formally the state and action spaces, the reward function, 
the transition function, and the feasible action correspondence. 

(b) Show that if u: A — R is increasing and linear, i.e., ula) = ka for some 
k > 0, then the problem always has at least one solution. Find a solution. 

(c) Show that if u: A —> R is increasing and strictly concave, then the problem 
has no solution. 

(d) What happens to part (c) if we assume O < 8 < |, ie., if the objective 
function is given by $ 2o ôu (c1)? 

In each period ¢ = 0, 1, 2,..., of an infinite horizon, a fishery must decide on 

the quantity q of fish to be caught for selling in the market that period. (All fish 

caught must be sold.) The firm obtains a constant price p per unit of fish it sells. 

Catching fish requires effort by the firm. In each period f, the firm must decide 

on its effort level e; € (0, 1) that it wishes to expend in this direction. The total 

catch q; is then determined as a function q; = h(e;, yi) of the effort level e, 

the firm expends and the number of fish y; in the lake that period. Assume that 

h : [0,1] x Ry — R+ is a continuous function satisfying h(e, y) e [0, y} for 

all y € R4, e € [0, 1]. Expending an effort level e; in period ¢ also results in a 

cost to the firm of c(e;), where c: [0, 1] > R+ is continuous. 

Lastly, the population of fish y; — qı not caught in period ¢ grows to a population 

¥r+1 in period ¢ + l according to a growth function f, as 


Year = fiq). 


Assume that (i) f:R4 — R+ is a continuous function, (ii) (0) = 0 (no fish 
in the lake this period implies no fish next period), and (iii) f maps (0, x] into 
(0, x] for some x > 0. Assume also that the initial stock of fish is yo € [0, x]. 
Finally, assume that the firm discounts future profit levels by a factor ô € (0, 1). 
Set up the dynamic programming problem faced by the firm. Show that this 
problem admits a stationary optimal policy. Describe the Bellman Equation. 
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28. A firm’s income in any period ¢ depends on the level of capital stock x, that it 
has that period, and the amount of investment i, it undertakes that period, and is 
given by g(x;) —A(i,), where g and h are continuous functions. The capital stock 
next period is then given by Bx; + i;, where £ e€ (0, 1) is a depreciation factor. 
In no period is investment allowed to exceed b > 0, where b is some fixed level. 
Investment is allowed to be negative but cannot be smaller than — fx, where x is 
the capital stock of that period. The firm discounts future income by 6 € (0, 1). 
Given that the firm begins with an initial capital stock of x9 > 0, show that the 
dynamic programming problem facing the firm has a solution, and describe the 
Bellman Equation. Explain all your steps clearly. 


Appendix A 


Set Theory and Logic: An Introduction 


This appendix discusses the basic rules of set theory and logic. For a more leisurely 
and detailed discussion of this material, we refer the reader to Halmos (1960), or the 
excellent introductory chapter of Munkres (1975). 


A.1 Sets, Unions, Intersections 


We adopt the naive point of view regarding set theory. That is, we shall assume that 
-what is meant by a set is intuitively clear. Throughout this chapter, we shall denote 
sets by capital letters such as A, B, X, and Y, and elements of these sets by lowercase 
letters such as a, b, x, and y. That an object a belongs to a set A is denoted by 


acA. 
If a is not an element of A, we shall write 
a gA. 


` If A is the set of all elements from some collection X which also satisfy some 
property TI, we will write this as 


A = {x € X |x satisfies the property IT}. 
If X is understood, then we will write this as simply 
A = {x | x satisfies the property TI}. 


It may be that there is no x € X that satisfies the property TI. For instance, there is 
no real number x that satisfies the property that x? + 1 = 0. In such a case, A will 
contain no elements at all. A set which contains no elements will be called the empty 
set, denoted Ø. 
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If every element of a set B is also an element of a set A, we shall say that B is a 
subset of A, and write 


BCA. 


We will also say in this case that A is a superset of B, and denote the relationship 
BC A alternatively by 


ADB. 


The set B will be said to be a proper subset of A if B C A, and there is some 
x € A with x ¢ B. In words, B is a proper subset of A if every element of B is also 
in A, but A contains at least one element that is not in B. 

Two sets A and B are said to be equal, written A = B, if every element of A is 
also an element of B, and vice versa. That is, A = B if we have both 


ACB and BCA. 


If two sets A and B are not equal, we will write this as A # B. Note that B isa 
proper subset of A if and only if we have B C A and B A. 7 

The union of two sets A and B, denoted A U B, is the set which consists of all 
elements which are either in A or in B (or both): 


AUB = {x|x€Aorx€ B). 


The intersection of two sets A and B, denoted 4 N B, is the set which consists of 
all elements which belong to both A and B: 


ANB = {x|x eA andx eB}. 
If A C X, the complement of A in X, denoted A‘, is defined as 
Ao = {xE Xx € A}. 


If the reference set X is understood, as it usually is in applications,! we will omit the 
words “in X,’ and refer to A‘ as simply the complement of A. 


A.2 Propositions: Contrapositives and Converses 
Given two propositions P and Q, the statement “If P, then Q” is interpreted as the 
statement that if the proposition P is true, then the statement Q is also true. We 


lFor instance, by the expression “the complement of the negative reals," one usually means the complement 
of the negative reals in R, which is the set of nonnegative reals. 
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denote this by 
P => Q. 


We will also say in this case that “P implies Q.” 

We stress the point that P => Q only says that if P is true, then Q is also tme. It 
has nothing to say about the case where P is not true; in this case, Q could be either 
true or false. For example, if P is the statement x > 0 and Q is the statement that 
x? > 0, then it is certainly true that 


P > Ọ, 


since the square of a positive number is positive. However, Q can be true even if P 
is not true, since the square of a negative number is also positive. 

Given a statement of the form “if P, then Q,” its contrapositive is the statement 
that “if Q is not true, then P is not true.” If we let ~ Q denote the statement that Q 


is not true (we will simply call this “not Q,” for short), then the contrapositive of the 
statement 


P > Q 
is the statement 
~Q >~P. 
For example, the contrapositive of the statement 
If x is positive, then x? is positive 
is the statement 
If x? is not positive, then x is not positive. 


A statement and its contrapositive are logically equivalent. That is, if the statement 
is true, then the contrapositive is also true, while if the statement is false, so is the 
contrapositive. This is easy to see. Suppose, first, that P => Q is true. Then, if Q 
is false, P must also be false: if P were true, then, by P = Q, Q would have to 
be true, and a statement cannot be both true and false. Thus, if P = Q holds, then 
~ Q =~ P also holds. Now, suppose P = Q is false. The only way this can 
happen is if P were true and Q were false. But this is precisely the statement that 
~ Q =~ P is not true. Therefore, if P = Q is false, so is ~ Q >~ P. 
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An important implication of the logical equivalence of a statement and its contra- 
positive is the following: if we are required to prove that P = Q, the result can be 
regarded as established if we show that ~ Q >~ P. 

The converse of the statement P = Q is the statement that 


Q > P, 


that is, the statement that “if Q, then P” 

There is no logical relationship between a statement and its converse. As we have 
seen, if P is the proposition that x > 0 and Q is the proposition that x? > 0, then it 
is certainly true that 


P > Q, 


but the converse 
Q> P 


is false: x could be negative and still satisfy x2 > 0. 
If a statement and its converse both hold, we express this by saying that “P if and 
only if Q,” and denote this by 


P & Q. 


For example, if P is the proposition that x > 0 and Q is the proposition that x? > 0, 
we have P & Q. 


A.3 Quantifiers and Negation 


There are two kinds of logical quantifiers, the universal or “for all” quantifier, and the 
existential or “there exists” quantifier. The former is used to denote that a property TI 
holds for every element a in some set A; the latter to denote that the property holds 
for at least one element a in the set A. 

The negation of a proposition P is its denial ~ P. If the proposition P involves a 
universal quantifier, then its negation involves an existential quantifier: to deny the 
truth of a universal statement requires us to find just one case where the statement 
fails. For instance, let A be some set and let N (a) be some property defined for 
elements a € A. Suppose P is the proposition of the form 


For all a € A, property II (a) holds. 


Then, P is false if there is just a single element a € A for which the property [(a) 
does not hold. Thus, the negation of P is the proposition 


There exists a € A such that property I(a) does not hold. 
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Similarly, the negation of an existential quantifier involves a universal quantifier: 
to deny that there is at least one case where the proposition holds requires us to show 
that the proposition fails in every case. That is, if Q is a proposition of the form 


There exists b € B such that property N’ (b) holds, 
its negation is the proposition 
For all b € B, property I’ (b) does not hold. 


As a concrete example, consider the following. Given a real number x, let M(x) 
be the property that x? > 0. Let P be the proposition that “Property M (x) holds for 
every real number x.” In the language of quantifiers, we would express P as 


For every x € R, x? > 0. 


P is evidently negated if there is at least one real number whose square is not strictly 
positive. So the negation ~ P is the statement 


There is x € R such that x? > 0. 


When multiple quantifiers are involved in a statement, the situation gets a little 
more complicated. If all the quantifiers in a given proposition are of the same type (i.e., 
they are all universal, or are all existential) the order of the quantifiers is immaterial. 
For instance, the statement 


For all x € R, forall y € R, (x + y}? =x? +2xy + y’, 
is the same as the statement 
For all y € R, forall x € R, (x + y} = x? +2xy + y. 


However, the order of the quantifiers becomes significant if quantifiers of different 
types are involved. The statement 


For every x > 0, there exists y > O such that v? = x 
is most definitely not the same as the statement that 
There exists y > O such that for all x > 0, y? = x. 


In fact, while the first statement is true (it asserts essentially that every positive real 
number has a positive square root), the second is false (it claims that a single fixed 
real number is the square root of every positive number). 

The importance of the order of quantifiers makes it necessary to exercise caution 
in forming the negation of statements with multiple quantifiers, since the negation 
will also involve the use of multiple quantifiers. To elaborate, let M (a, b) denote a 
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property defined on elements a and b in sets A and B, respectvely. Consider the 
statement P 


For every a E€ A, there exists b € B such that TI (a, b) holds. 


The statement P will be falsified if there is even one a € A for which the property 
TI (a, b) fails to hold, no matter what we take for the value of b € B. Thus, the 
negation of P is the statement ~ P defined by l 


There exists a € A such that for all b € B, T (a, b) fails. 


We reiterate the importance of the order of quantifiers in forming this negation. The 
negation of P is not the statement 


For every b € B, there exists a € A such that T (a, b) fails. 


A.4 Necessary and Sufficient Conditions 


The study of optimization theory involves the use of conditions that are called neces- 
sary conditions and sufficient conditions. These are implications of the form P => Q 
and Q => P that were discussed in the previous section. There is, however, one point 
that bears elaboration. Necessary and sufficient conditions in optimization theory are 
usually derived under some subsidiary hypotheses on the problem, and their valid- 
ity depends on these hypotheses holding. Moreover, these hypotheses need not be 
the same; that is, the necessary conditions may be derived under one set, while the 
sufficient conditions may use another. We discuss some aspects of this issue here. 

We begin with a definition of necessary conditions in the abstract. Suppose an 
implication of the form 


P> Q 


is valid. Then, Q is said to be a necessary condition for P. The reason for this 
terminology is apparent: if P => Q holds, and P isto be true, then Q must necessarily 
also be true. 

In optimization theory, P is usually taken to be a statement of the form 


P: x* isa maximum of a function f on a constraint set D. 


Under some subsidiary hypotheses on f and D, we then try to identify implications 
Q of x* being a maximum point. Any such implication Q, that arises from the 
assumption that x* is a maximum, is a necessary condition for an optimum, whenever 
the subsidiary hypotheses used in this derivation hold. 

This is the approach we take in Chapters 4 through 6 of this book. In Chapter 4, for 
instance, we make the following subsidiary hypotheses, that we shall call H; here: 
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Hı: Dis open, and f is differentiable on D.? 


We then show that if the proposition P (that x* is a maximum of f on D) is truc, 
then the proposition Q must be true, where Q states that 


Q: Df(x*) = 0. 


Thus, the condition that Df(x*) = 0 is a necessary condition for f to have a 
maximum at x*, provided H; holds. 

For a definition of sufficient conditions in the abstract, suppose that a statement of 
the form 


Q > P 


is true. Then, Q is said to be a sufficient condition for P. In words, since the truth of 
Q must imply the truth of P, it is enough for P to be true that Q is true. 

Sufficient conditions come in many forms in optimization theory. In one, we take 
P to be a statement of the form 


P: x* isa maximum of f on D. 


The objective now is to find, under some subsidiary hypotheses on the problem, a 
set of conditions Q such that whenever the conditions in Q are met at the point x*, 
P is always true. Such conditions are called sufficient conditions for the point x* to 
be a maximum. 

This is the route we follow in Chapters 7 and 8 of this book. In Chapter 7, for 
instance, one of the results assumes the subsidiary hypotheses H that 


H2: D is convex and open, and f is concave and differentiable on D. 
It is then shown that if proposition Q that 
QO: Df(x")=0 


holds, then proposition P is also true. That is, the condition Df(x*) = 0 is a sufficient 
condition for a maximum at x* provided H2 holds. 

An alternative class of sufficient conditions that optimization theory studies in- 
volves the statement P* that 


P*: There exists a maximum of f on D. 


The objective is then to find a set of conditions Q* on f and D such that proposition 
P* is true. Such conditions Q*, in the context of optimization theory, are called 
sufficient conditions for the existence of a maximum, These are the conditions that 
Chapter 3 and the chapters on dynamic programming are primarily concerned with. 


2The actual conditions we assume are weaker than this. 
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In Chapter 3, for instance, we show that if Q* is the set of conditions that 
Q*: D is acompact set in R” and f is continuous on D 


then it is true that Q* => P*, i.e., that a solution exists in the maximization problem. 
A condition Q is said to be both necessary and sufficient for P if it is the case that 


P & Q. 


In this case, P and Q are equivalent in the sense that either both are true, or both are 
false. Thus, identifying the truth of P is the same thing as identifying the truth of Q. 

In some sense, this equivalence makes conditions that are necessary and suffi- 
cient an ideal set of conditions for working with in optimization theory. However, as 
should be apparent from the two examples above, necessary conditions often require 
far weaker subsidiary hypotheses on the problem than sufficient conditions. There- 
fore, necessary conditions derived under a given set of subsidiary hypotheses are 
unlikely to also be sufficient under those hypotheses. For instance, the condition that 
Df(x*) = 0 is necessary for x* to be a maximum under H, but it is not sufficient 
under Hı. Sufficiency of this condition requires the stronger hypotheses contained 
in Ho. 

On the other hand, since the subsidiary hypotheses used in deriving sufficient 
conditions are typically stronger than those used to prove that the same conditions 
are necessary, it is possible that sufficient conditions derived under a given set of 
hypotheses may also be necessary under the same hypotheses. For instance, it is 
apparent that whenever Hz is met, Hj is also met. Therefore, we have the following 
result: the proposition Q that 


Q: Df") = 0 
is necessary and sufficient for the proposition P that 
P: x*isamaximum of f on D 


provided H3 holds. 


Appendix B 
The Real Line 


There are many methods for obtaining the real number system from the rational 
number system. We describe one in this appendix, which “constructs” the real number 
system as (appropriately defined) limits of Cauchy sequences of rational numbers. 
An alternative constructive approach—the method of Dedekind cuts—is described 
in Rudin (1976). A third approach, which is axiomatic, rather than constructive, may 
be found in Apostol (1967), Bartle (1964), or Royden (1968). 

Our presentation in this appendix, which is based on Strichartz (1982), is brief and 
relatively informal. For omitted proofs and greater detail than we provide here, we 
refer the reader to Hewitt and Stromberg (1965), or Strichartz (1982). 


B.1 Construction of the Real Line 


We use the following notation: N will denote the set of natural numbers and Z the 
set of all integers: 


N = {1,2,3,...} 
Z = {...,—-2,-1,0,1,2,...}. 


Q will denote the set of rational numbers: 


Q = [ix 2. pP.q4 EQ, q #0}. 


It is assumed throughout this appendix that the reader is familiar with handling 
rational numbers, and with the rules for addition (+) and multiplication (-) of such 
numbers. It can be shown that under these operations, the rational numbers form a 
field; that is, for all rationals a, b, and c in Q, the following conditions are met: 


1. Addition is commutative: a + b = b +a. 
2. Addition is associative: (a + b} +c =a + (b+c). 
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. Multiplication is commutative: a -b =b-a. 

. Multiplication is associative: (a - b) - c =a - (b - c). 

. Multiplication distributes over addition: a - (b +c) =a-b+a-c. 
0 is the additive identity: O + a = a. 

l is the multiplicative identity: l -a =a. 

Every rational has a negative: a + (—a) = Q. 


COI AA Ew 


; 3 1 
Every non-zero rational has an inverse: a- ~ = 1. 
a 


A rational p/q (with q # 0) is positive if p and q are both positive; it is negative if 
p is positive and q negative, or q is positive and p negative; and it is zero if p = 0. 
Every rational is either positive, negative, or zero. The rationals are also ordered: for 
every two distinct rationals a and b, we have either a > b (if a — b is positive) or 
b > a (if a — b is negative). Finally, the rationals satisfy the triangle inequality: if x 
and y are any rationals, we have 


Ix + yl < [xl +I 


where |x| denote the absolute value of a rational x, i.e., |x| = x, Mx > 0, and 
|x| = —x, otherwise. 

The need to extend the rational number system arises from the observation that 
there are objects such as VŽ that are easy to describe (say, as the solution to x? = 2), 
and that intuitively “should” form part of a reasonable number system, but that are 
not part of the rational number system. The real number system attempts to extend 
the rational number system to close such holes, but also in a manner which ensures 
that the field properties of the rationals and the order relation (>) are also extended 
to the reals. 

Two simple and intuitive ideas underlie the construction of the reals that we present 
here. The first is that although objects such as /2 are not rational, they are capable 
of being approximated arbitrarily closely by rational numbers. For example, the 
sequence of rational numbers 


1.4, 1.41, 1.414, 1.4142, ... 
when squared, results in the following sequence, which gets closer and closer to 2: 
1.96, 1, 9881, 1.999396, 1.99996164, .... 


The second idea central to our construction is that the approximating sequence of 
rationals cannot be unique. For instance, the sequences 


1.4, 1.41, 1.414, 1.4142, 1,41421, ... 
1.5, 1.42, 1.415, 1.4143, 1.41422, ... 
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both approximate V2, the first sequence from “below” (each term in the sequences ~ 
strictly smaller than /2), and the second from “above” (each term is strictly larger 
than J2). 

To sum up, the idea is to regard real numbers as limits of sequences of rational 
numbers, but with the caveat that two sequences of rationals are to be regarded as 
equivalent (i.e., as defining the same real number) if they are themselves getting 
closer together, the farther one goes out in the sequence. We formalize these ideas 
now. 

Let |a] denote the absolute value of a rational number a, and let |a — b| denote 
the distance between two rationals a and b. A sequence of rational numbers {xn} is 
called a Cauchy sequence if it is the case that for all natural numbers n € N, there 
exists an integer m (n), such that for all k, l? > m(n), we have 


1 
xx- xi < -. 
n 


In words, a Cauchy sequence of rationals is one where the distance between terms 

in the tail of the sequence gets smaller, the farther one goes out in the sequence. 
Two Cauchy sequences of rationals {x,} and {yn} will be called equivalent if for 

all natural numbers n € N, there exists m(n) such that for all k > m(n), we have 


] 
xk — Vel < —. 
n 


We write this as {xn} ~ {yn}. In words, equivalent Cauchy sequences are those whose 
terms are getting closer together, the farther we go out in the sequences. Intuitively, 
if two Cauchy sequences are equivalent, they can be viewed as approximating the 
same real number. 

It is easy to show that ~ is, in fact, an equivalence relationship: it is reflexive 
dx} ~ (xn }), and symmetric ({xn} ~ {yn} implies {yn} ~ (x, }). It is also transitive: 
that .s. {xn} ~ {yn} and {yn} ~ {zn} implies {xn} ~ {zn}. We leave the proof of the 
transitivity of ~ as an exercise to the reader. 

An equivalence class C of a Cauchy sequence of rationals is a collection of Cauchy 
sequences of rationals such that if {x„} € C and {ya} € C, then {xa} ~ {yn}. 
At an intuitive level, all members of a given equivalence class may be viewed as 
approximating the same real number. Thus, we may as well identify each equivalence 
class with a real number, and this is, in fact, our definition: 


Definition B.1 The set of all equivalence classes of Cauchy sequences of rationals, 
denoted R, is called the real number system. A typical element of R, denoted x, is 
called a real number. 
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In keeping with the intuition underlying this definition, we will also say that if 
{xn} is an element of the equivalence class x € R, then {x,} converges to x, or that 
x is the dimit of the sequence {xp}. 


B.2 Properties of the Real Line 


We now tum to an examination of the properties possessed by the real line, when it is 
defined as in the previous section. We are especially interested in examining whether 
the reals also constitute a field, and whether they can be ordered. The first question 
requires us to first define the notion of addition and multiplication for arbitrary real 
numbers. The following result shows that one may proceed to do this in the obvious 
manner. 


Lemma B.2 Let {xn} and {x}} be Cauchy sequences of rationals in the equivalence 
class x € R, and let { yn} and {y/} be Cauchy sequences of rationals in the equivalence 
class y € R. Then: 


1. {Xn + yn} and {x} + yf} are both Cauchy sequences of rationals.*Moreover, they 
are equivalent sequences. 

2. {Xn Yn} and {x - yl} are both Cauchy sequences of rationals. Moreover, they 
are equivalent sequences. 


Proof Left as an exercise. 0 


In view of this lemma, we may simply define the sum (x + y) of two real numbers x 
and y in R as the equivalence class of {xn + yn}, where {xn} converges to x (i.e., {xp} 
is in the equivalence class of x) and {yn} converges to y. Similarly, the product x - y 
can be defined as the equivalence class of {xp + yn}. With addition and multiplication 
defined thus, our next result states that one of the properties desirable in R is, in fact, 
true. The proof of this result is omitted. 


Theorem B.3 The real numbers form a field. 


Now, let {x,} be any Cauchy sequence of rationals converging to x € R. Then, x 
is said to be positive if there is n € N and m(n) such that 


1 
xk > -, k>m(n). 


= 


The number x is negative if —x is positive. Under these definitions, it can be shown 
that the real numbers inherit another important property of the rationals: 
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Theorem B.4 Every real number is positive, negative, or zero. The sum and product 
of positive numbers is positive, while the product of a negative and positive number 
is negative. 


Proof Omitted. Qa 


When the notion of positivity on a field satisfies the conditions of this theorem, 
the field is called an ordered field. Thus, the reals, like the rationals, are an ordered 
field, 


The notion of a positive number can be used to define the notion of absolute values 
for real numbers. Given any x € R, the absolute value of x, denoted |x], is defined 


by 
x, x>0 
jx] = 
—x, x <0. 
We can also define inequalities for real numbers: given x and y in R, we say that 
x > yifx—y>0,thatx < yif y—x > 0, etc. Finally, we define the (Euclidean) 


distance between two real numbers x and y, to be the absolute value |x — y| of their 
difference. The following simple result comes in handy surprisingly often: 


Lemma B.5 Let x and y € R be the limits of the Cauchy sequences of rationals 
{xn} and {yn}, respectively. If Xn > Yn for each n, then x > y. 


Proof If not, then x < y,sox — y <Q. Since x — y is the limit of x, — yn, there 
must then exist mm € N such that x, — yn < i for all large n. This contradicts the 
assumption that x, > yn for all n. a 


Theorem B.6 (Triangle inequality for real numbers) Let x and ybe realnumbers. 
Then, 

Ix+ yl < [x] +p. 
Proof Let {x;} and {x} be sequences of rationals converging to x and y, respectively. 


By definition, the Cauchy sequence of rationals {xg + y4} converges to x + y. By 
the triangle inequality for rationals, 


[xk + vel < [xa] + lyki, 


for each k. Taking limits as k —> oo, and using Lemma B.5, the theorem is proved. 
a 
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There is another important property of the real line: it is the case that there are 
rational numbers arbitrarily close to any real number. This fact is expressed by saying 
that the rationals are dense in the reals.! 


Theorem B.7 (Denseness of Rationals) Given any real number x and any n € N, 
there exists a rational number y € Q such that |x — y| < L, 

Proof Let {xn} be a Cauchy sequence of rationals converging to x. Then, given n, 
there exists m(n) such that 


1 
xk- xl < —, k,l > m(n). 
n 


Define y = Xm(n). Then, [xz — yl < 1 for all k, so taking limits as k — oo, and 
using Lemma B.5, we obtain |xy| < L 0 


Extending the definition for rational numbers in the obvious manner, we say that 
a sequence of real numbers {xn} converges to a limit x € R if it is the case that for 
all € > 0, there is n (€) such that for all n > n(€), we have 


|x, —x| < €. 


We also say that {x,} is a Cauchy sequence if for all € > 0, there is n(€) such that 
for all k,/ > n(e), we have 


lxx- x| < €. 


Note that these definitions are meaningful, since we have defined inequalities between 
any two real numbers. 

Our last result is particularly important because it shows that if we do to the 
reals what we did to the rationals, no new numbers are created: the limits of Cauchy 
sequences of real numbers are also only real numbers. This fact is expressed as saying 
that the real line R is complete.” 


Theorem B.8 (Completeness of R) A sequence of real numbers {xn} has a limit if 
and only if it is a Cauchy sequence. 


Proof Itis easy to see that if a sequence {x,} converges to a limit x, then {xn} must 
be a Cauchy sequence. We leave the details to the reader as an exercise. 

So suppose that {xn} is a Cauchy sequence of real numbers. We are to show that 
there is x € R such that x, > x. 


lFor a definition of denseness in the context of abstract metric spaces, see Appendix C. 
2For a definition of completeness in the context of abstract metric spaces, see Appendix C. 
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Since each x, is a real number, there is a rational number y, arbitrarily close to 
it. In particular, we can choose a sequence {yp} of rationals such that for each n, we 


have 
1 


Xn — yr) < —. 
n 
It is not very difficult to show that since {x,} is a Cauchy sequence, {yn} is also a 
Cauchy sequence. Once again, we leave the details to the reader as an exercise. Since 
{yn} is a Cauchy sequence of rationals, it converges to a limit x € R. 
We will show that x = limp 00 Xn also. To this end, note that for any n, we have 


1 
Ix —Xnl < |X — yal + ln -xrl < x= yalt > 


Since {yn} converges to x, it is the case that |x — yn] can be made arbitrarily small 
for all n sufficiently large. Since this also true of the term L, it follows that (x, } 
converges to x. Oo 


Appendix C 


Structures on Vector Spaces 


This appendix provides a brief introduction to vector spaces, and the structures (inner 
product, norm, metric, and topology) that can be placed on them. It also describes an 
abstract context for locating the results of Chapter 1 on the topological structure on 
R”. The very nature of the material discussed here makes it impossible to be either 
comprehensive or complete; rather, the aim is simply to give the reader a flavor of 
these topics. For more detail than we provide here, and for omitted proofs, we refer 
the reader to the books by Bartle (1964), Munkres (1975), or Royden (1968). 


C.1 Vector Spaces 
A vector space over R (henceforth, simply vector space) is a set V, on which are 
defined two operators “addition,” which specifies for each x and y in V,, an element 
x + yin V; and “scalar multiplication,” which specifies for eacha € Rand x e€ V, 
an element ax in V . These operators are required to satisfy the following axioms for 
all x, y,z€V anda,beR: 


1. Addition satisfies the commutative group axioms: 


(a) Commutativity: x + y= y +x. 
(b) Associativity: x + (y + z) = (x + y) +z. 
(c) Existence of zero: There is an element 0 in V such that x + 0 = x. 
(d) Existence of additive inverse: For every x € V, there is (~x) € V such that 
xX + (-x) =O. 
2. Scalar multiplication is 


(a) Associative: (ab)x = a(bx), and 
(b) Distributive: a(x + y) = ax + ay, and (a + b)x = ax + bx. 


Throughout, the letters a, b, c, etc. will denote scalars while x, y, z, will denote 
elements of V. 
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throughout this chapter, include the following: 


Example C.1 Let V = R”. Define addition and scalar multiplication on V in the 


usual way: for x = (xy,..., Xn) and y = (y1,..., yn) in V, anda € R, let 
x+y =O 4+ yi... Xn + Yn) 
ax = (axy,..., aXn). 


It is an elementary matter to check that addition and scalar multiplication so defined 
meet all of the requisite axioms to make R” a vector space over R. a 


Example C.2 Let V be the space of bounded sequences in R. A typical element of 
V is a sequence x = (x1, x2, .. .) which has the property that 


supļ]x;] < œœ, 
ieN 


where |x;| is the absolute value of x;. Define addition and scalar multiplication on I 
as follows: for x = (x1, x2,...) and y = (y1, y2,...) in V, anda € R, let 
Xty= Ot yax t 32.---) 


ax 


(ax},GX2,...). 


Then, V is a vector space. a 


Example C.3 Let C((0, 1]) denote the space of all continuous functions from [0. || 
toR: 


C(O, 1) = (f:[0,1] — R | f is continuous on (0, IJ}. 


Let V = C((0, 1}). Define addition and scalar multiplication on V as follows: for f, 
gin V anda € R, let (f + g) and af be the functions in V, whose values at any 
€ e [0, 1] are given respectively by 


(Ff + g)(E) = SE) + a) 
(af XE) = af (E). 
Once again, it is a simple matter to verify thatall relevant axioms are met, so C({0, 1} 


defines a vector space over R. Oo 


Many different structures can be placed on V . We will examine four of these (inner 
product spaces, normed spaces, metric spaces, and topological spaces) below. The 
four we examine are successive generalizations in that every inner product space 
generates a normed space, every normed space a metric space, and every metric 
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space a topological space. The reverse containments need not always be true; thus, 
for instance, there exist topological spaces that are not generated by any metric space, 
and metric spaces that are not generated by any normed space. 


C.2 Inner Product Spaces 


The most restrictive structure on V that we shall study is that of the inner product. 
An inner product on V is a function (-, -) from V x V to R that satisfies the following 
conditions for all x, y € V: 


1. Symmetry: (x, y) = (y, x). TR gR 

ss : Gi AN Ga Ba he 
2. Bilinearity: (ax+by, z) (ax, z) (By, z),and (x, ay+bz) ="(x, gy) (x, bz). 
3. Positivity: (x, x) > 0, (x, x) = Oiffx =0. 


An inner product space is a pair (V, (+, -)), where V is a vector space over R, and 


{:,-) is an inner product on V. 
Typical examples of inner product spaces include the following: 


Example C.4 Let V = R”, with addition and scalar multiplication defined as in 
Example C.1. For x, y € R”, let 


n 
(y) = xy = > x9: 
i=l 
We have seen in Chapter | that (-,-) defined in this way satisfies the conditions 
of symmetry, positivity, and bilinearity. Thus, (x, y) = )°7_, Xi yi defines an inner 


product over V = R", the Euclidean inner product. Oo 


Example C.5 Let V = C({0, 1]), with addition and scalar multiplication defined 
asin Example C.3. For f, g € V, let 


i gE dg 
(fe) = f SOO, 
It is immediate that the symmetry condition is met. To check positivity, note that 
UN = [IOLO 
= [UEa 
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This last term is always nonnegative, since ( /(¢))? > 0 for all £, and the integral of 
a function which only takes on nonnegative values is nonnegative. Finally, 


TESTE f (ASE) + byte) ACE E 


= / af (E)A(E)dE + f bg(ë)h(E)dE 
“ay, h) be. h), 


so bilinearity also holds. Therefore, (f, g) = f /(&)g(&)d& defines an inner product 
over V = C({0, 1}). a) 


It is important to note that more than one inner product can be defined on the 
same vector space V . For instance, let V = R?, and let a; and az be strictly positive 
numbers. Then, it can be verified that 


(x, y) = axi yı + arx2y2 
defines an inner product on V. Unless a} = a2 = 1, this is not the Euclidean inner 
product on R?. 


We close this section with a statement of a very useful result regarding inner 
products: 


Theorem C.6 (Cauchy—Schwartz Inequality) Ler (., .) be an inner producton V. 
Then, 


(ey) < xP (y, yy? 
forallx,yeV. 


Proof The proof of the Cauchy—Schwartz inequality for the Euclidean inner product 
in R” (see Chapter 1, Theorem 1.2), relied only on the three properties defining an 
inner product, and not on the specific definition of the Euclidean inner product itself. 
The current result may, therefore, be established by mimicking that proof, with the 
obvious changes. n 


C.3 Normed Spaces 


A notion less restrictive (in a sense to be made precise) than the inner product is that 
ofa norm. A norm is a function ||- il: V — R that satisfies the following requirements 
foralx,ye V: 


1. Positivity: |x|] > 0, Ixl] = 0 iff x = 0. 
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2. Homogeneity: |lax|| = lall- ixl. 
3. Triangle Inequality: |x + yl] < lxi + ly. 


An normed space is a pair (V, || - ||), where V is a vector space, and |] - |] is a norm 
on V. Standard examples of normed spaces include the following: 


Example C.7 Let V = R”, and let p > 1 be any real number. Define ||- |p: V — R 
by 


n 1/p 
ixlp = (= st) , 
i=l 


where |x;| denotes the absolute value of x;. 

It is easy to see that ||- || satisfies homogeneity. It also satisfies positivity since |x;|, 
and, hence, |x;|?, is nonnegative for each i. Finally, that || - |[p satisfies the triangle 
inequality is a consequence of an inequality known as the Minkowski inequality. For 
a general statement of Minkowski’s inequality, we refer the reader to Royden (1968, 
p.114). In the special context of R”, the Minkowski inequality states that 


lx + yll? < Well? + Ill? 


which is precisely the triangle inequality. 
Therefore, for any p > 1, ||- lp defines a norm over R”, the so-called p-norm 
on R”. o 


Note that, as with inner products, Example C.7 shows that many different norms 
can be placed on the same vector space. Examples C.8 and C.9 below, which define 
various norms on the space C((0, 1]), reiterate this point. 

When p = 2, the p-norm |j - ||p on R” is called the Euclidean norm. As p > oo 
we obtain the sup-norm 


Wx floo = Ix llsup = ag lxil. 


Example C.8 Let V = C({0, 1]), with addition and scalar multiplication defined 
as in Example C.3. For f € V and p > 1, let 


1 I/p 
iflp = (/ \renrae ; 


As in the earlier example, it is easy to check that the positivity and homogeneity 
conditions are met by || - ||». The triangle inequality is again a consequence of the 
Minkowski inequality (see Royden, 1968, p.114). o 
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Example C.9 Let V = C((0; 1). For f € V, let 
IA = sup | fE). 
£e{0.1] 


Since f is continuous and [0, 1} is compact, || fI] is well defined and finite for every 
f e V. itis evident that |] - || meets the positivity and homogeneity conditions 
required of a norm. To see that it meets the triangle inequality, note that for any f 
and g in V, and any & € [0, 1}, we have 


IAE) + gE) < IAE) + gE) 
< ILAI + iell, 


where the first inequality is just the triangle inequality for the absolute value of re al 
numbers, and the second inequality follows from the definition of || - ||. Taking the 
supremum of | f(E) + g(&)| over £, we now obtain 


If+egll < A+ igl 


and the triangle inequality is also met. Thus, ||- || is a normon C([0, 1]), the sup-norrn. 
o 


The following results relate norms and inner products. The first result shows that 
every inner product on V generates a norm on V. The second result shows that if 
a norm is generated by an inner product, it has to satisfy a certain property. Itis an 
easy matter to construct norms (even on R”) that do not satisfy this property, so a 
corollary of this result is that the concept of a norm is less restrictive than that of the 
inner product. l 


Theorem C.10 Let {-, -) be an inner product on V. Let \\x |} = (x, x)'/*. Then, ||- |} 
isa normon V. 


Proof Immediate. a) 


Theorem C.11 (Parallelogram Law) /f || - || is a norm generated by an inner 
product, then it satisfies the following equation for all x, y € V: 


IIx + yl? + lix — yl? = 2x? + ye). 
Proof Omitted. See, e.g., Bartle (1964, p.58). OQ 


It is actually true that the parallelogram law characterizes norms and their assoc i- 
ated inner products. If a norm || - || satisfies the parallelogram law, then the following 
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relationship, called the polarization identity 


1 
(ey) = z (lx + yi? — lx ~ v1?) 


defines an inner product from the norm, and the norm defined by this inner product 
must coincide with the original norm. We leave the details as an exercise. 


C.4 Metric Spaces 
C.4.1 Definitions 


A metric on V is a function d: V x V — R that satisfies the following conditions 
for all x, y,z EV: 


1. Positivity: d(x, y) > 0, with equality iff x = y. 
2. Symmetry: d(x, y) = d(y, x). 
3. Triangle Inequality: d(x, z) < d(x, y) +d(y, z). 


A metric space is a vector space V with a metric d on V, and is usually denoted 


(F , d). Standard examples of metric spaces include the following: 


Example C.12 Let V = R”. Define the metric dp on V by 
dp(x, » = lk- Yip. 


where || - ||, is the p-norm of the previous section. It follows from Theorem C.14 
below that dp is a metric on R”. Since dp is generated by the p-norm, it is called the 
p-metric on R”. o 


When p = 2, the p-metric dp on R” is called the Euclidean metric. As p —> 0o, 
we obtain the sup-norm metric doo: 


dæœlx, y) = sup jx — yil. 
TRR 


iefl..n) 


Example C.13 Let V = C((0, 1]). Define the metric dp on V by 


1 \/p 
4)(f,g) = (s f(x) - ecorfas) 
Then, dp is generated by the p-norm on C ([0, 1)), and it follows from Theorem C.14 
below that dp is a metric on C([0, 1)). D 


Justas every inner product generates a norm, itis also true that every norm generates 
a metric: 


C.4 Metric Spaces ' 38 


Theorem C.14 Let || -i be a-norm on V . Define d by d(x, y) = x — yl, x; ye J 
Then, d is a metriconV. 


Proof Immediate from the properties of the norm. 


On the other hand, there do exist metrics even on R”, which are not generated b 
any norm. An example of such a metric is the “discrete metric” p on R defined by 


1, xy, 


p(x, y) = te a 


Thus, the concept of a metric space is less restrictive than that of a normed space. 


C.4.2 Sets and Sequences in Metric Spaces 


Let a metric space (V, d) be given. Given a sequence {xn} in V, we will say that {x,, 
converges to the limit x € V (written x, — x) in the metric d, if it is the case that 


d(xn,X) > Oasn > oo. 


Note that convergence depends on the metric d. For instance, the sequence {x, } = 
{1/n} converges in the Euclidean metric to the point 0, but not so in the discrete metric 
(In the discrete metric, the only convergent sequences are constant sequences.) In th: 
interests of expositional cleanliness, however, we will drop the constant reference 
to the underlying metric d in the sequel. Thus, for instance, we will simply say tha 
“X,_ —> x” rather than “x, —> x in the metric d.” 

A point x € V is said to be a limit-point of a sequence {xn}, if there exists . 
subsequence {xx ny} of {Xn} such that xn) —> x. Note that x is a limit point of (x, 
if and only for all r > 0, it is the case that there are infinitely many indices n fo 
which d(x,,x) <r. 

A sequence {xn} in V is said to be a Cauchy sequence if for all € > 0, there 1 
n(€) such that for all m,n > n(€), we have d(xm, Xn) < €. 

Every convergent sequence in a metric space (V, d) must obviously be a Cauchi 
sequence, but the converse is not always true. For instance, suppose V = (0, 1) am 
the metric d on V is defined by d(x, y) = |x — y| for x, y € V. Then, {xn} = (t/n 
is a Cauchy sequence in V , but that does not converge (there is no x € V such tha: 
Xn — x). A metric space (V, d) in which every Cauchy sequence converges is cal lec 
complete. 

The open ball of radius r and center x in (V, d), denoted B(x, r), is defined by 


B(x,r) = {ye V | dix, y) <r}. 


chr m + dd 
Z 
Ceo p a mta as 
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A set X C V is said to be open in V, or simply open if it is the case that for any 
x € X, there isr > O such that B(x,r) C A. 

As with convergent sequences, it must be stressed that whether or not a subset X 
of V is open depends on the particular metric being used, For instance, while not all 
subsets of R are open in the Euclidean metric, every subset X of R is open under the 
discrete metric on R. To see this, pick any X C Rand any x e X. If r e (0, 1), then 
from the definition of the discrete metric, the open ball B(x,r) with center x and 
radius r consists of only the point x. Since x € X, we have shown that there exists 
r > Osuch that B(x,r) C X. Since x € X was arbitrary, we have shown that X is 
an open subset of R under the discrete metric. 

A set X C V is said to be closed in V , or simply closed, if X° is open, where X°, 
the complement of X in V, that is the set defined by 


{yeV | y¢ X}. 


It is easy to mimic the steps of Theorem 1.20 in Chapter 1 to establish that 


Theorem C.15 The set X C V is closed if and only if for all sequences {xx} in X 
such that {xk} converges to some x € Ẹ, itis the case that x € X. 
Vy 
The set X C V is compact if every sequence of points in X contains a conver- 
gent subsequence. That is, X is compact if for all sequences {x,,} in X, there is a 
subsequence {x,(4)} of {xn}, and a point x € X, such that 


lim x, a 
k= n(k) 


Furthermore, a set X C V is said to be bounded if X is completely contained in 
some open ball around Q, that is, if there isr > O such that X C B(0,r). 

As in R” with the Euclidean metric, the following properties hold for arbitrary 
metric spaces also. The proofs are omitted: 


Theorem C.16 Let A be an arbitrary index set. If Xa is an open set in V for each 
a € A, then so is Uuc A Xa. 


Theorem C.17 Let A be an arbitrary index set. If Ya is a closed set in V for each 
a € A, then so is DacA Ya. 


Theorem C.18 Let F be a finite index set. If Xg is an open set in V for every o € F, 
then so is Nger Xo. 


Theorem C.19 Let F be a finite index set. If Yọ is a closed set in V for every € F, 
then so is Uger Yọ. 
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Theorem C.20 A set Z C Vis compact if and only if every open cover of Z hasa 
finite subcover. That is, Z is compact if and only if whenever (Xq)ac a is an arbitrary 
family of open sets satisfying Z C UaeaXa, there is a finite subfamily (Xo )de r such 
that Z C Uger Xo- 


However, not all properties that hold in R” under the Euclidean metric hold for 
general metric spaces. For instance, an important result in IR” is that a set Z is 
compact if and only if it is closed and bounded. However, in arbitrary metric spaces, 
it is possible for a set to be closed and bounded without the set being compact. Here 
is an example: 


Example C.21 As in Example C.2, let V be the space of bounded sequences in R. 


Let || - || denote the sup-norm on V , i.e., for x = (x1, x2,...) E V, let 
|x|] = sup [xi], 
IEN 


and let d denote the corresponding metric: 


d(x, y) = |lx— yll = sup|x — yil. 
ieN 
Let e; be the element of V that contains a 1 in the i-th place, and zeros elsewhere. 
Note that for any i and j with i # j, we have d (e;, ej) = 1. Let 


X = (e |i =1,2,...}. 


The set X is bounded since |le;|| = 1 for all i. Since d(e;,e;) = 1 for all i # j, 
X contains no convergent sequences, so it is (vacuously) closed. However, the 
sequence {e;} is a sequence in X with no convergent subsequence, so X is not 
compact. o 


C.4.3 Continuous Functions on Metric Spaces 
Given metric spaces (V1, dı) and (V2, d2), a function f: V} —> V2 is said to be 
continuous at the point v € V4, if it is the case that for all sequences {va} in V; with 


Un —> v, it is the case that f(v,) > f(v). Equivalently, f is continuous at v if for 
all sequences {v,} in Vy, 


dı (vn, v) > O implies d2( f (vn), f(v)) — 0. 


The function f is said to be continuous on V; if f is continuous at each v € V4. 


By using virtually the same arguments as in the proof of Theorem 1.49 in Chapter 1, 
it can be shown that 
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Theorem C.22 Let f: Vı —> V2, where (Vi, dı) and (V2, d2) are metric spaces. 
Then, f is continuous at v € V, if and only if for all open sets U} C V2 such that 
J(v) € U2, there is an open set U} C V, such that v € Uj, and f(x) € U2 forall 
xEU. 


As an immediate corollary of this result, we get the result that a function is con- 
tinuous if and only if “the inverse image of every open set is open”: 


Corollary C.23 A function f: Vy — V2 is continuous on V, if and only if for all 
open sets U2 C Vj, fU) = {x € U1 | f(x) € U2} is an open set in Vj. 


C.4.4 Separable Metric Spaces 


A point x € V is said to be a limit point of a set X C V if for all r > 0, there is 


z€V,z4x, such that x 
z€ B(x,r) AY, 


that is, if the open ball B(x, r) contains a point of X different from x. 

The closure in V of a set X C V is the set X together with all the limit points of 
X. We will denote the closure of X by cl(X). Note that we have cl(X) = X if and 
only if the set X is itself closed. 

A set W C V is said to be dense in V if the closure of W contains V, that is, if 
cl(W) D V. Every set is dense in itself. However, as we have shown in Appendix B, 
every real number arises as the limit of some Cauchy sequence of rational numbers, 
so the closure of the rationals Q is the entire real line R. Thus, a dense subset of a 
set X could be very “small” in relation to X. 

A metric space (V, d) is said to be separable if it possesses a countable dense set, 
that is, there exists a countable subset W of V such that cl(W) = V. For example, 
the set of rationals Q is countable and is dense in R, so R is separable. 

The following result gives us an equivalent way of identifying separability. It 
comes in handy frequently in applications. 


Theorem C.24 A metric space (V , d) is separable if and only if there is a countable 
family of open sets {Zn} such that for any open Z C V, it is the case that Z = 
Unen(z)2n, where N(Z) = {n | Zn C Z}. 


Proof See Royden (1968, Chapter 7, p.130). Oo 
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C.4.5 Subspaces 


Let (V , d) be a metric space, and let W C V. Define a metric dy on W by dy (x, y) = 
d(x, y) for x, y € W. That is, dw is just the metric d restricted to W. The spac: 
(W , dy) is then called a (metric) subspace of (V , d). 

Let (W, dy) be a subspace of (V, d). A set X C W is said to be open in W i 
there exists an open set Y C V such that X = Y N W. If W is itself open in V, the: 

___asubset_¥ of W is open in W if and only if itis open in V. 

For example, consider the interval W = (0, 1] with the metric dy (x, y) = |x = 1 
for x, y € W. Then, (W, dy) is a subspace of the real line with the Euclidean metric 
The set G, 1] is open in W since G, = G, 3) NW and G, 3) is open in R. 

The following result relates separability of (V, d) to separability of the subs pac: 
(W, dy): 


Theorem C.25 Every subspace of a separable metric space is itself separable. 


Praof Let (V,d) be a separable metric space, and let (W, dy) be a subspace o 
(V,d). Since V is separable, there exists, by Theorem C.24, a countable collectio: 
of open sets (Zn )nen such that every open set in V can be expressed as the union o 
sets from this collection. Define the family (Yn)nen by Yn = Zn N W. Then, Y, i 
open in W for each n. Now suppose a set X is open in W. Then, there exists a set .V 
open in V such that X = XN W. Therefore, there exists K C N and such that 


NOG VUiex Ži. 
It follows that 
X C UjieKk Yi. 


Thus, we have shown that an arbitrary open set in W can be expressed as the unio! 
of open sets drawn from the countable collection (Y;)nen. By Theorem C.24, W i 
separable. f 


Finally, our last result of this section relates the notions of “completeness” ans 
“closedness” of a metric space: 


Theorem C.26 [f(W, dy) is complete, then it is closed. On the other hand, if W i 
a closed subset of V, and V is complete then (W, dy) is also complete. 


Proof Left as an exercise. { 
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C.5 Topological Spaces 
C.5.1 Definitions 


Unlike a metric space (V, d) which uses the metric d to determine which classes 
of sets are open, and which are closed or compact, a topological space takes as 
the defining primitive on a vector space V, the notion of open sets. Formally, a 
topological space (V, t) is a vector space V and a collection t of subsets of V that 
satisfy 


1 @V er. 
2. 01, O2 € t implies O1 N O2 Er. 
3. For any index set A, Oy € t fora € A implies Uxe4Oa € T. 


The sets in r are called the “open sets” of V (under t); t itself is called a topology 
onV, 

As we have seen in the previous subsection, if the open sets of V are identified 
using a metric d on V, then the collection of open sets generated by d will have 
the three properties required above. Thus, every metric space (V, d) gives rise to a 
topological space (V, t). If a topological space (V, T) is generated by a metric space 
(V, d) in this manner, then the topological space is said to be metrizable. 

It is possible that two different metric spaces (V, dı) and (V, d2) give rise to the 
same topological space (V, t), since two different metrics may generate the same 
class of open sets (see the Exercises). Thus, even if a topological space is metrizable, 
there may not be a unique metric space associated with it. It is also possible that a 
topological space is not metrizable. For instance: 


Example C.27 Let V be any vector space consisting of at least two points. Define 
T to be the two-point set t = {@, V }. Then, it is easily verified that t meets the three 
conditions required of a topology (r is called the trivial topology on V ). We leave it 
as an exercise to the reader to verify that there cannot exist a metric d on V such that 
the only open sets of V under d are Ø and V itself.! o 


Thus, a topological space is a generalization of the concept of a metric space. 

Finally, a small point. Even if a topological space is metrizable, the topology could 
be quite “large.” For instance, under the topology generated on V by the discrete 
metric (the discrete topology), every point of V, and therefore every subset of V, is 
open. 


‘If there were sucha metric, it must satisfy d(x, y) = 0 for all x and y in V, but this violates the condition 
required of a metric that d(x, y) > 0 if x Æ y. 
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C.5.2 Sets arid Sequences in Topological Spaces = == 


A sequence {xn} in a topological space (V, r) is said to converge to a limit x € V 
if for all open sets O € t such that x € O, there exists N such that for all n > N. 
Xn €O. 

A set X C V is closed if A‘ is open (i.e. if AS € t). Itis easy to show the 
following properties: 


Theorem C.28 Let (V, t) be a topological space. Then, Ø and V are both closed. 


Theorem C.29 Let (V, t) be a topological space. Let (Xy)ger be a collection of 
closed sets in V, where F is a finite index set. Then, Uge F Xọ is also closed. 


Theorem C.30 Let (V,1) be a topological space. Let (Xa)aea be a collection of 
closed sets in V, where A is an arbitrary index set. Then, Daca Xa is also closed. 


A subset W of a topological space (V, t) is said to be compact if every open cove r 
of W has a finite subcover. It is not very hard to see that if (V, t) is metrizable, 
then a set W is compact in the sequential sense (i.e., every sequence in W contains a 
convergent subsequence) if and only if it is topologically compact (every open cover 


of W has a finite subcover). If a topological space is not metrizable, this is no longe r 
true.? 


C.5.3 Continuous Functions on Topological Spaces 

Let (V, rt) and (V’, t’) be topological spaces, and let f: V — V'. Then, f is said 
to be continuous at x € V if for all O’ € t’ such that f(x) € O’ it is the case that 
f7'(O’) € t. Further, fis continuous on V if forall O’ € r’, wehave f(O) er. 

It is easy to see that if (V, t) and (V', t’) are metrizable spaces, then a function is 
continuous at x in the topological sense given above, if and only if f is sequentially 
continuous at x (i.e., for all sequences x, -> x we have f(x,) > f(x)). We 
leave it to the reader to examine if this remains true if (V, t) and/or (V', t’) are not 
metrizable. 


C.5.4 Bases 
A base at x for a topological (V, t) is a collection of open sets By C t such that 


(O e t,x € O) implies there is B € B, such that x € B C O. 


2In a sense this is analogous to the result that while closed and bounded sets in R are compact, this is not 
true in general metric spaces. 
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In words, B, is a base for (V, t) at x, if every open set containing x is a superset of 
some set in the base that also contains x. 


For example, suppose (V, t) is a metrizable topological space with metric d. Pick 
any x € F, and define B, by 


B, = {B| B= B(x,r) for somer > 0}, 


where, of course, B(x, r) represents the open ball with center x and radius r in the 
metric d. Since the metric d gives rise to the topology t, it follows from the definition 
of an open set in metric spaces that B, forms a base at x. Thus, the concept of a base 
at x is simply the topological analogue of the metric space notion of the open balls 
B(x,r) around x. 

A base for a topological space (V , t) is a collection B C t of open sets such that 
B is a base at x for each x € V. It is easy to see that 


Proposition C.31 Let B be a base for (V, t). Then, O € t if and only if for each 
x € O, there is B € B such that x € B C O. 


Proof Necessity (the “only if” part) follows from the definition of a base B for 
(V, t). To see sufficiency, suppose O C V is such that if x € O, then there is B € B 
with x € B C O. Then, we must have 


O = UBeB|BcO}. 


Therefore, O is the union of open sets, and must be open. That is, O € t. o 


A base B is said to be a countable base if it contains countably many sets. A 
topological space (V, t) is said to be first-countable, or to satisfy the first axiom of 
countability, if for each point x, there exists a countable base By at x. It is said to be 
second-countable, or to satisfy the second axiom of countability, if it has a countable 
base B. 

The following result relates separability to the notion of a countable base. 


Theorem C.32 Let (V, t) be a metrizable topological space with metric d. Then, 
1. (V, t) is first-countable. 
2. (F, Tt) is second-countable if and only if the metric space (V , d) is separable. 


Proof If at each x e V, we take the open balls B(x,r) for rational r > 0, then 
the collection 8, of all such open balls forms a base at x which is countable. Thus, 
every metric space is first-countable, which proves Part 1. Part 2 is an immediate 
consequence of Theorem C.24. a 
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C.6 Exercises 


. Show that if a norm || - || on a vector space V is generated by an inner produc 


(-,-), then the polarization identity holds, i.e., show that for all x and yin F, Į 
have 


l 
(x.y) = gle + yll? — lx — yll?). 


Hint: This follows by expanding (x + y,x + y) = |x + yll. 


. Using the polarization identity, prove the Parallelogram Law. 


3. Of all the p-norms on R”, only one is generated by an inner product—th 


Euclidean (p = 2) norm. Show that, for example, | - |; is not generated b: 
an inner product. One way to do this is to show that the Parallelogram Law 1 
violated for some pair x and y. 


. Let dı and d2 be metrics on a vector space V . Define dy and dẹ to be equivalen 


if there exist constants cy and cz in R such that for all x, y € V, we have 


d\(x, y) < c2d2(x, y) 


d(x, y) < cdi (x, y). 
Show that if d) and dz are equivalent metrics on V, then the following hold: 


(a) A sequence {xn} in V converges to a limit x in (V,d)) if and only {x, 
converges to x in (V, d2). 
(b) A set X C V is open under d, if and only if it is open under d2. 


. Give an example of two metrics d; and d on a vector space V , that generate th: 


same open sets but are not equivalent. 


6. Which of the metrics induced by the p-norms || - {|p on R? are equivalent? 


7. Is the equivalence of metrics a transitive relationship? That is, is it true that if «/ 


lIl. 


is equivalent to d2, and d is equivalent to d3, then d} is equivalent to d3? 


. Show that the discrete metric on R does, in fact, meet the three properties require: 


of a metric. 


. Which subsets of R are closed sets in the discrete metric? Which are compact 
10. 


Prove that under the discrete metric, (Y,d) is separable if and only if V3 
countable. 


Let (V1, dy) denote R with the discrete metric and (V2, d2) denote R with th: 
Euclidean metric. Let F denote the class of continuous functions from V; to J" 
G the class of continuous functions from V4 to V;, and H the class of continuou 
functions from V2 to V2. Compare F, G, and H. Which is the largest set (in term 
of containment)? Which is the second largest? 
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How does the class of continuous functions on R” under the sup-norm metric 
compare to those on R” under the Euclidean metric. (Assume in both cases that 
the functions map R” into R, and that the range space R is given the Euclidean 
metric.) 


. (The p-adic metric) Fix a prime p. Any rational r can be written as r = p~"(¢) 


where n,a, b, are integers and p does not divide a or b. Let |r|) = p”. This, 
the p-adic “valuation,” is not a norm. For r, q, rational, let the p-adic metric 
be defined by d(r, q4) = |r — q|p. Show that the p-adic valuation obeys |q|p + 
Irlp < max{lq|p. |rlp}. Use this to show that the p-adic metric satisfies the 
triangle inequality. Verify the other requirements for a metric. Finally, show that 
ifr € B(q, e) then B(q, e) C Bir, e). It follows that B(q, e) = Bir, e), so every 
point in an open ball around r in the p-adic metric is also the center point of the 
ball! 


. Let V bea vector space. Suppose V is given the discrete topology. Which subsets 


of V are compact? 


. Let V be a vector space. Suppose V is given the discrete topglogy. Which func- 


tions f: V — R are continuous on V, if R is given the Euclidean topology? 


. Let (X, dı) and (X2, d2) be metric spaces. Let Z = X x Y. Define a metric (the 


product metric) on Z by 
dl(x, y), (x, y) = ix, x’? + day, y». 


Show that the product metric is, in fact, a metric. 


. Let(V, t) and (V’, t’) be topological spaces, and let W = V x V’. The product 


topology tw on W is defined as the topology that takes as open, sets of the form 
Ox0', Oet, O'er, 


and completes the topology by taking the finite intersection and arbitrary union 
of all such sets. Show that if (V, qt) and (V’, t’) are topological spaces that 
are metrizable (with metrics d and a’, say), then the product metric defines the 
product topolgy. 


. A set X C V ina topological space (V, t) is said to be sequentially compact if 


every sequence of points in X contains a susequence converging to a point of X. 


(a) Show that if (V, 1) is a metrizable space (say, with metric d), then X is 
sequentially compact if and only if it is compact, that is, if and only if every 
open cover of X contains a finite subcover. 

(b) Give an example of a nonmetrizable topological space (V, t) in which there 
is a sequentially compact set that is not compact. 


19, 
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Show that if (V, t) and (V’, t’) are metrizable topological spaces, then a functio! 
f:V — V’ is continuous at a point x in the topological sense if and only if 1 
is sequentially continuous at x, i.e., if and only if for all sequences xa => v, wi 
have f(x) > f(x). Is this true if (V, t) and/or (V’, t’) are not metrizable”? 
Let V = [0, 1]. Let t be the collection of sets consisting of 4, V , and all subset 
of V whose complements are finite sets. Show that t is a topology for V. Show 
also that (V, t) is not first-countable. 
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